Introduction
In this article, I am going to explain how to delete duplicate rows/records in an SQL server using common table expression (CTE). This is one of the most common questions asked in an SQL interview.
Here we will be using SQL Server 2017 or you can use SQL Server 2008 or above.
Read my previous Joins in SQL Server 2017 part of this article using the below links,
Prerequisites
SQL Server 2017 or you can use SQL Server 2008 or above version.
Now, first, we will create a Database and a table.
Creating a Database and a Table
Step 1. Create a Database
Open your SQL Server and use the following script to create the “chittadb” Database.
Create database chittadb
Now, select the script query then press F5 or click on Execute button to execute the above script.
You should see a message, “Command(s) completed successfully.” This means your new database has been created.
Step 2. Create a table
Open your SQL Server and use the following script to create table “tbl_Mcastudents”.
CREATE TABLE tbl_Mcastudents
(
Id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
Name NVARCHAR(50),
Location NVARCHAR(30),
Gender VARCHAR(10)
)
Execute the above query to create “tbl_Mcastudents “.
You should see a message, “Command(s) completed successfully.”
Now, data has been inserted into the table.
Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')
Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')
Insert into tbl_Mcastudents values ('Chitta', 'Chennai', 'Male')
Insert into tbl_Mcastudents values ('Rani', 'Puri', 'Female')
Insert into tbl_Mcastudents values ('Rani', 'Puri', 'Female')
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')
Insert into tbl_Mcastudents values ('Mitu', 'BBSR', 'Male')
Execute the above query, you should see a message, “Command(s) completed successfully.”
Now retrieve all data from the “tbl_Mcastudents” table.
SELECT * FROM tbl_Mcastudents
Output
There are many duplicate rows (10, 11, 12), (13, 14), and (15, 16, 17) for the tbl_Mcastudents that have the same Name, Location, and Gender.
Delete duplicate rows/records in the SQL server using common table expression (CTE)
To delete the duplicate rows from the table in SQL Server, we follow these steps,
- Find duplicate rows using group BYclause orROW_NUMBER()
- UseDELETEstatement to remove the duplicate rows.
Query
WITH cte AS
(
SELECT
Id,
Name,
Location,
Gender,
ROW_NUMBER() OVER (PARTITION BY Name, Location, Gender ORDER BY Name, Location, Gender) AS row_num
FROM
tbl_Mcastudents
)
DELETE FROM cte WHERE row_num > 1;
In the above query
- First, the CTE uses theROW_NUMBER()function to find the duplicate rows specified by values in the name, Location, and gender.
- Then, the delete statement deletes all the duplicate rows but keeps only one occurrence of each duplicate group.
To execute the above query you should see a message.
(5 rows affected)
The above message indicates that the duplicate rows have been removed from the table.
Now retrieve all data from the “tbl_Mcastudents” table after the duplicate rows have been deleted.
SELECT * FROM tbl_Mcastudents
Output
Conclusion
In this article, we have learned how to delete duplicate rows from a table in SQL Server.