Remove Duplicate Records in SQL Server 2005

SQL Server tables should never contain duplicate rows, if there is a key constraints. Table with no key constraints allow the duplicates. It happens in many ways such as import data from other resources.

Nothing to worry, write a query and make your database without duplicates.

First we need a table and some data to explain the process. Use the following script to create a table and insert a record.

CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2))

CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2))
 
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')
INSERT INTO Dup_Students VALUES(3, 'James', 'EE')
INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')
INSERT INTO Dup_Students VALUES(5, 'George', 'IT')
--
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')
INSERT INTO Dup_Students VALUES(3, 'James', 'EE')
INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')

--
INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')
INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')

Now you can view the number of duplicate records in each row.

SELECT ID, FirstName, Department, Count(*) as DuplicateCount
From Dup_Students
group by ID,FirstName, Department

Before deleting you can view the records with row number for the duplicates. This result set is going to act as a temp table for the delete process.

SELECT ID, FirstName, Department,
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students

We are going to delete all the duplicates using CTE (Common table expression) and ROW_NUMBER(), which is a new in SQL server 2005.

WITH Dup_Students_CTE (ID, FirstName, Department, DuplicateRowCount)
AS
(
SELECT ID, FirstName, Department,
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students
)
DELETE FROM Dup_Students_CTE
WHERE DuplicateRowCount > 1
GO


Similar Articles