In the last couple of tutorials, we’ve been dealing with SQL queries that handle numeric data. Now we’re going to learn how to group together that data in a column and organize it.
📌GROUP BY: group results by a specified column
📌HAVING: narrowing down/specifying the location further
#Table of Contents
Query 1. #GROUP BY
SELECT store_id, count(active) FROM customer WHERE active=1 GROUP BY store_id;
Query 2. #GROUP BY #HAVING
SELECT rating, COUNT(rating) AS 'number' FROM film GROUP BY rating HAVING number>180 ORDER BY number DESC;
Q1. Find how many customers are active and organize them by the originating store
GROUP BY (column name you want to group by)
Let’s check out the customer table first. Run this query:
SELECT * FROM `sakila`.`customer`;
Here we can see the customer ID, store ID, the customer’s full name, email, and address ID. Let’s focus on the store ID and active columns for now. If we want to know how many customers are currently active at our imaginary film store, we can try running this query:
SELECT COUNT(active) FROM customer;
But this doesn’t give us very clear information. We still don’t know how many are active, since this query only shows how many customers have information entered in the ‘active’ column. It doesn’t specify whether they are active (specified with a 1) or not active (specified with a 0). So let’s filter this information by adding
SELECT COUNT(active) FROM customer WHERE active=1;
Okay, now we know we have 584 active customers. But there is one more factor we don’t know yet… which store has how many active customers? We need to plug in the store ID into our query.. but how?
By using the
GROUP BY query, we can get a clear and organized look at this information. First, we need to add store_id to our
SELECT query in order to view that column. Next, we want to specify with
GROUP BY how we want the results to be organized. Here’s how to plug it in:
SELECT store_id, COUNT(active) FROM customer WHERE active=1 GROUP BY store_id;
Now we can clearly see that store #1 has 318 active customers, and store #2 has 266 active customers!
Q2. Find the quantity of movies for every rating and rename the new column to ‘number’. Sort the results in descending order and filter out anything less than 180.
GROUP BY (column name you want to group by) having (specified)
Let’s check some movie ratings back in the film table.
To view the ratings column, we need to specify it under the
SELECT query. We also want to count how many ratings there are, so we'll add the
COUNT query. Let’s also use what he just learned and throw in the
GROUP BY query :
SELECT rating, COUNT(rating) FROM film GROUP BY rating;
Here are the 5 distinct movie ratings and how many we have of each. The title of the count column isn’t as pretty and neat as rating, so let’s fix that with the
AS query. Just add it after the
COUNT query and you’ll get this:
SELECT rating, COUNT(rating) AS'number' FROM film GROUP BY rating;
Let’s say we want to only see movies that we have MORE than 180 copies of — how can we filter that information? We’ve tried a similar query before with the
WHERE query, but this time we’re going to use the
SELECT rating, COUNT(rating) AS'number' FROM film GROUP BY rating HAVING number>180 ORDER BY number DESC;
🔑 Why do we use HAVING instead of WHERE?
WHERE is processed before
GROUP BY. That means the information wouldn’t be divided by rating when
WHERE executes. But
HAVING gets processed after
GROUP BY, so it can be used to constrain results by the group we specified.
Just remember, the
WHERE query applies to all rows in the result set.
HAVING query is applied to the groups created by a
GROUP BY query.
🔑 Comparing DISTINCT and GROUP BY
Try running both these queries in QueryPie. What’s the result?
SELECT DISTINCT replacement_cost FROM film; SELECT replacement_cost FROM film GROUP BY replacement_cost;
As you can see, the two queries are very similar. They will give you the same results in this case. The only difference is the order.
⭐️But for future use, know that
GROUP BY allows you to use aggregate functions (
DISTINCT is used to remove duplicates.
Your challenge today is to find the number of movies in the film table. You must group them by price, and rename the column as ‘number’.
Here’s the answer:
SELECT rental_rate, COUNT(rental_rate) AS number FROM film GROUP BY rental_rate;