In the previous article of this SQL Server series we learned about SELECT statements.
To learn more click here.
In this article we will see how to group rows using Group By and we will also see the difference between Where and Having clauses.
But first we need to create a table and some records that we will use in this example.
- CREATE DATABASE db_MyCart;
- GO
- USE db_MyCart;
- GO
- CREATE TABLE tblSale(
- [Customer Id] INT CONSTRAINT pk_Id PRIMARY KEY,
- [Customer Name] NVARCHAR(50),
- [Gender] NVARCHAR(10),
- [City] NVARCHAR(50),
- [Contact Number] NVARCHAR(15),
- [Item Purchased] NVARCHAR(100),
- [Price] DECIMAL
- );
-
- INSERT INTO tblSale VALUES (1,'Rajeev Shah','Male','Mumbai','1234567890','Sony Xperia z3',45000),
- (2,'Chandan Kumar','Male','Kolkata','2345671234','Samsung Galaxy s5',35000),
- (3,'Meena Kumari','Female','Mumbai','2343465474','Nokia Lumia 1520',40000),
- (4,'Harjinder Singh','Male','Bokaro','45377968435','Samsung Galaxy s5',35000),
- (5,'Shruti Agarwal','Female','Kolkata','9865434532','Sony Xperia z3',45000)
So, we have our table. Now let's understand how the Group By works.
Group By clause is used to group a set of rows into a summary of the values of one or more columns.
The Group By clause is always used as a connection with one or more aggregate functions like sum(), min(), max and so on.
Let's look at some of the demos where we can use the group by clause.
From the table above, I want the sum of the price column based on which the items were sold and for that we can use the sum() function.
Pass the column name as a parameter in the sum function and we get the total sum. But look at the output's column name, it shows (no column name). So, let's add a column name and for that all we need to give is an alias. To provide an alias we can use the AS keyword and then the alias.
Let's look at another example where we want the sum of the Price column and group the result by city.
The query above takes each city record and if any match is found it adds the price of that row and displays it.
Now what will happen if I remove the Group By clause from the above query?
We will get an error.
The error makes sense, since how will the aggregate function display the sum of the prices and how will the city column rows be displayed?
So, all the columns that are not part of the aggregate function that are in the select clause must be present in the group by clause or else we will get an error. We can even Group the summary based on the columns that are not in the select clause.
Is it possible to group by multiple columns? Absolutely.
Let's look at another example where we want the total price and group it by gender and city.
Based on the condition it must be clear by now, in the output we will have three columns and since, Gender and city are of type nvarchar so we cannot apply aggregate functions on it which means we need to pass these columns in the group by clause.
You can even change the order of the group, you can first group it by City and then by Gender but the output will be the same.
We have seen how to group by multiple columns but can we use more than one aggregate function? Yes.
In the above query, we have three columns in the select clause out of which we used Sum() and Count() aggregate functions on the Price and Customer_Id columns. The count function will provide us the total rows present based on the parameter we pass and in the end we are grouping the records by city.
Until now we have seen how to group the records based on two conditions and we have also seen how to use more than one aggregate function. Now let's say we want the same records that we retrieved using the two aggregate functions but this time we want the records based on some condition. For example, we want only the records whose gender is female. Is it possible? Absolutely.
There is another way to do the same thing. We can use the HAVING clause.
When we use a Having clause, the condition we pass in this clause must be present in the select clause and the group clause or just in the group by clause.
The having clause comes after the group by clause.
We can even use an aggregate function in the Having clause as in the following:
Difference between WHERE and HAVING
The WHERE clause can be used with Select, Insert, Update and Delete statements whereas the HAVING clause can be used with the Select statement.
WHERE clause filters rows before aggregation whereas HAVING filters groups after aggregation.
An aggregate function cannot be used in the WHERE clause unless it is in a sub-query contained in a HAVING clause whereas aggregate functions can be used in a HAVING clause.