Friday, December 24, 2010

MsSQL Delete Duplicate Records

This article shall discuss how to delete duplicate records from a Microsoft SQL Server database table. Duplicate record means that one or more fields in one record is identical to corresponding fields in a different record.

A simple solution but then often vexing problem for many.

For example, we have this database table

--Create my table
create table MyTable (
 ID int identity(1,1) not null primary key,
 Name varchar(20) not null,
 Email varchar(500) not null
)
go
--Populate some dummy records and duplicate records
insert into MyTable (Name, Email) values ('Jack', 'jack1@email.com')
insert into MyTable (Name, Email) values ('Jack', 'jack2@email.com')
insert into MyTable (Name, Email) values ('Jill', 'jill@email.com')
go

To check if we have duplicate records, we can make use of COUNT() function and GROUP BY statement.

--We do have duplicated names
select ID, Name, Email from MyTable order by Name
select Name, COUNT(Name) from MyTable group by Name having COUNT(Name) > 1
go

From the above SELECT, we know records having Name=Jack are duplicated.

To remove duplicates, we can make use of the following to remove duplicates while keeping the latest.

--Let us remove duplicates and keep the latest (the one having larger ID value)
delete t1
from MyTable t1, MyTable t2
where t1.Name = t2.Name and t1.ID < t2.ID
go

No comments:

Post a Comment

Do provide your constructive comment. I appreciate that.