Introduction
This article will teach us how to find and delete duplicate values from a SQL server table. Sometimes we may find duplicate values in a table, and we have to find the duplicate values from that table. In this article, we will find duplicate values using the "Group By and Having" clause, "Common Table Expressions (CTE)," and "Rank" function.
For this operation, we will create a table named “employee” and add some records.
CREATE TABLE [dbo].[employee](
[empid] [int] IDENTITY(1,1) NOT NULL,
[empname] [nvarchar](50) NULL,
[empaddress] [nvarchar](50) NULL
) ON [PRIMARY]
INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Alok Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amanda Murphy', 'Bengaluru')
INSERT INTO [dbo].[employee] VALUES('Kartik Reddy', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Amanda Murphy', 'Bengaluru')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Bhubaneswar')
INSERT INTO [dbo].[employee] VALUES('Amit Mohanty', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Swarup Pradhan', 'Hyderabad')
INSERT INTO [dbo].[employee] VALUES('Malaya Kabiraj', 'Kolkata')
By Using Group By and Having a clause
By using Group By clause, we can group all table data as per our defined columns and count the duplicate occurrence of rows. We can use the COUNT function.
For example,
SELECT [empname], [empaddress], [duplicate] = COUNT(*)
FROM [dbo].[employee]
GROUP BY [empname], [empaddress]
HAVING COUNT(*) > 1;
In the above query, we will get all data having duplicated more than one, and the “[duplicate]” column shows how many duplicate records there are.
Now we need to keep one record and delete all duplicate records. We can use the “MIN” function to get the first record of all duplicate records.
SELECT * FROM [dbo].[employee]
WHERE [empid] NOT IN (SELECT MIN([empid])
FROM [dbo].[Employee]
GROUP BY [empname], [empaddress]);
In the above query, we will exclude the minimum id of each duplicate row.
To delete the duplicate records, we can use the below query.
DELETE FROM [dbo].[Employee]
WHERE [empid] NOT IN (SELECT MIN([empid])
FROM [dbo].[Employee]
GROUP BY [empname], [empaddress]);
By Using Common Table Expressions (CTE)
We use the “Row_Number” function and Common Table Expressions to find duplicate values. CTE is available in SQL Server 2005 or higher version.
;WITH CTE([empname], [empaddress], [duplicate])
AS (SELECT [empname], [empaddress], ROW_NUMBER() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [duplicate]
FROM [dbo].[employee])
SELECT * FROM CTE;
The above query “[duplicate]” column gives a unique row id for each duplicate row.
If we want to delete duplicate records using CTE, then we have to add a “Where” clause with the condition “[duplicate] > 1”.
;WITH CTE([empname], [empaddress], [duplicate])
AS (SELECT [empname], [empaddress], ROW_NUMBER() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [duplicate]
FROM [dbo].[employee])
DELETE FROM CTE WHERE [duplicate] > 1;
The above query deletes all duplicate records with “[duplicate]” greater than 1.
By Using RANK() Function
We can also delete duplicate records by using the “Rank()” function. Here we will use the Rank function, the “PARTITION BY” clause, and “INNER JOIN.”
SELECT e.[empid], e.[empname], e.[empaddress], t.[rank]
FROM [dbo].[employee] e
INNER JOIN (SELECT *, RANK() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [rank]
FROM [dbo].[Employee]) t ON e.[empid] = t.[empid];
The above query “[rank]” column gives a unique row id for each duplicate row, like the CTE query.
To delete duplicate records here also have to add a “Where” clause with the condition “[rank] > 1”.
DELETE e FROM [dbo].[employee] e
INNER JOIN (SELECT *, RANK() OVER(PARTITION BY [empname], [empaddress] ORDER BY [empid]) AS [rank]
FROM [dbo].[Employee]) t ON e.[empid] = t.[empid] WHERE [rank] > 1;
Conclusion
In this article, we have learned how to find and delete duplicate values in a table by using the “Group By and Having” clause, “Common Table Expressions (CTE),” and “Rank” function in SQL Server.
I hope this will help the readers. Happy Coding !!!