Did you know that you can write SQL-like queries in Google spreadsheets? This feature is incredibly powerful, versatile, and can replace most of the functionalities offered by Excel’s pivot tables.
Read our article to find out how you can easily use the QUERY function to efficiently work with your Google Sheets data.
Relational databases share many features with spreadsheets. When we explain what databases are, we often say that they are a bit like spreadsheets except they give us more control over larger amounts of data.
Now, let’s get back to spreadsheets. Spreadsheets are important because data isn’t always stored in a database. Many companies use CSV files or spreadsheets to manage their info. These can be opened in applications such as Google Sheets or Microsoft Excel, but these files don’t work with databases as such.
Does that mean you can’t use SQL when you work with data in spreadsheets? Not at all! Google Sheets provides a QUERY option that allows you to write SQL-like instructions and retrieve data in a way that’s similar to SQL. This way, you can use the power of SQL even if you don’t have a database to work with!
Needless to say, Google Sheets is available for free. The program can open a wide range of file formats, including CSV and Excel files.
In this article, we’ll show you a few basic examples of working with SQL in Google Sheets. Some SQL knowledge is recommended, but it’s not required to understand the concepts we’ll present. You should be able to follow the examples even if you’ve never seen an SQL statement before. However, if you’d like to learn SQL,
Sampling Google Sheets Data
In this article, we’ll work with a simple Google Sheets spreadsheet that contains some basic information about selected hotels in Indonesia. We’ve used imaginary hotels, so don’t try to find them.
Hotels table contains the following columns:
- Id – The unique identifier for that hotel.
- Name – The hotel’s name.
- Stars – The number of stars awarded to the hotel: 3, 4 or 5.
- Rating – The average rating of the hotel by its guests, on a scale from 0 to 10.
- TwinRoomPrice – The base price for a twin room for one night.
- City – The city where the hotel is located: Bandung, Denpasar, or Surabaya.
Even though the hotels are imaginary, the cities are real. Google Bandung, Denpasar, and Surabaya —they are really good Indonesian travel destinations!
All right. Now that we know the data, let’s learn how to use SQL in Sheets.
Understanding the Query Function
To write SQL-like instructions in Google Sheets, we’ll only need a single function named QUERY. The format of the function isn’t particularly difficult:
As you can see, the QUERY function takes three parameters, of which only two are required. Let’s look at those parameters:
data– The range of cells containing the data (in this case, the range is the entire
query– The SQL-like query to be performed.
headers– The number of rows that contain header information. These are usually at the top of the data, i.e. the column names. This argument is optional. If you don’t provide it, Google Sheets will try to figure it out. The application is quite good at this, so the parameter is typically not needed. We’ll omit it in this article.
Don’t worry if these parameters look a bit vague to you—we’ll provide a few easy-to-follow examples in the next paragraphs.
First Query in Google Sheets
Let’s start with the easiest possible example. We’ll simply select all the data from our
We’ll first pick a free cell to the right of the
Hotels table in Google Sheets and start writing our query in the following way. (Don’t hit Enter yet.)
The first parameter,
data, is the cell range. In this case, we provided
A1:F23, which corresponds to the range of the
Hotels table, as shown in the picture below.
Now, after the comma, we should provide our SQL-like query within quotes. The official name for the query language used in Google Sheets is Google Visualization API Query Language. To select all the data, we need the following code:
Note that you may need to replace the comma that separates the parameters with a semicolon or another character, based on your Google Sheets settings.
In Google Sheets,
SELECT * means “select everything”. Note how similar this instruction is to its SQL equivalent,
SELECT * FROM Hotels. In Google Sheets, we omit the FROM clause because the data range is specified in the first argument.
If you press Enter now, Google Sheets will turn the query into a result set, as shown in the picture below:
The table you can see is essentially identical to the original table. This is the expected behavior. Our query was
SELECT *, which means “show everything from the original data”.
Great! We’ve just written our first query in Google Sheets!
Working with Columns and Conditions
Now let’s take a look at some more advanced examples. Suppose we now want to show only three columns: Name, Rating, and TwinRoomPrice. What’s more, we only want to see three-star hotels.
In this case, we’ll have to refer to individual columns. The query will look like this:
=QUERY(A1:F23, "SELECT B, D, E WHERE C=3")
As you can see, instead of using column names from our table (such as Name, Stars, or Rating), we use the column letters provided by Google Sheets. The column Name is located in column B in the spreadsheet, so we used
SELECT B to show it in the resulting table. By the same token, we used D to select the Rating column and E to select the TwinRoomPrice column. The columns are also separated with commas, but there is no comma after the last column we want to select.
After selecting the columns, we add
WHERE C=3. C refers to the Stars column, so the instruction basically means select three-star hotels only (i.e. where the C column equals 3).
The screenshot below shows how the query refers to the individual columns:
When we hit Enter, Google Sheets produces a neat table that contains three columns with all the info for three-star hotels: Name, Rating, and TwinRoomPrice.
Note that the equivalent SQL query would be very similar:
Adding More Conditions and Ordering Rows
In our next example, we want to find all hotels in Bandung which have a rating above 7.0. We also want to see the resulting hotels sorted from least to most expensive.
To satisfy these requirements, we’ll have to add two new parts. We’ll need to join multiple conditions with the keyword
AND, and we’ll need to sort rows with a new clause named
ORDER BY. Here’s the query:
=QUERY(A1:F23, "SELECT * WHERE D > 7.0 AND F='Bandung' ORDER BY E")
Look at the
WHERE clause of the query above. Column D contains the hotel ratings. In this case, we want the rating to be greater than 7.0. We also want to introduce another condition: the hotels must be located in Bandung. To introduce more than one condition, we use the
Column F contains the hotels’ cities, so we wrote
F='Bandung' to get hotels from Bandung. Note that text values need to be surrounded by single quotes (unlike numbers). If we forget the quotes, we’ll get an error.
Finally, we added the following piece of code:
ORDER BY E. The
ORDER BY clause is used to sort the resulting rows. In this case, we want to sort the rows by TwinRoomPrice, which is in column E. By default, the rows will be sorted in ascending order.
When we hit the Enter button, we can see the resulting table. Indeed, we can only see hotels from Bandung with a rating greater than 7.0. Notice that all rows are sorted by price.
Counting Grouped Rows
Finally, we’ll move on to two examples that you can use to produce results similar to Excel pivot tables.
First, we want to count the number of hotels in each city. To do that, we’ll use the following query:
=QUERY(A1:F23,"SELECT F, COUNT(A) GROUP BY F ")
The query contains two new elements:
GROUP BY F. Let’s start with the latter.
GROUP BY F means that we want to divide all rows into groups based on the values in Column F (which is City). We perform the grouping (also known as “aggregation”) to be able to show each city (
SELECT F) alongside the number of hotels in that city (
COUNT(A) counts the number of rows in the given group. If no GROUP BY clause is provided, it will count all the rows. We put column A (Id) inside the parentheses so that all the unique hotels will be counted. We chose Id out of convention; you could use another column, such as B (the hotel name), but ID columns are usually preferred because they always contain unique values.
When we hit Enter, we can see the resulting table with the number of hotels in each city:
The equivalent SQL query would look like this:
In the last example, we’ll calculate the average rating and average price for hotels in each category (three-star, four-star, and five-star). We’ll need the following query:
=QUERY(A1:F23, "SELECT C, AVG(D), AVG(E) GROUP BY C")
GROUP BY C to group all hotels by the Stars column. We also introduced
AVG(E), which are used to calculate the average values in columns D (Rating) and E (TwinRoomPrice), respectively. When we press Enter, we should see this result table:
The equivalent SQL instruction would look like this:
SQL(ish) Queries in Google Sheets? Absolutely!
In this article, I explained how to use a query language similar to SQL in Google Sheets. You should now have a solid understanding of the QUERY function in Sheets. You can also see how similar Google’s Query Language is to standard SQL.
There is certainly more to Google’s query language than we were able to show in this introductory article. If you’re interested in using its other features, take a look at the official documentation for GOOGLE VISUALIZATION API QUERY LANGUAGE.