SQL problem - Delete Duplicate Emails
Problem
Query
DELETE FROM
Person p1
WHERE
p1.id IN (
SELECT id
FROM (
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY Email ORDER BY id) as row_num
FROM Person
) as p
WHERE row_num > 1
)
Explanation:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY Email ORDER BY id) as row_num
FROM Person
In this part, the query uses the ROW_NUMBER()
window function to assign a unique row number to each record within partitions defined by the “Email” column. The PARTITION BY Email
clause means that the numbering restarts for each unique email address, and the ORDER BY id
ensures a consistent order within each partition.
SELECT id
FROM (
-- Inner query here
) as p
WHERE row_num > 1
The outer query selects the “id” column from the result of the inner query (aliased as “p”). It filters the results to include only those rows where the row number (row_num
) is greater than 1. This means it will retrieve the IDs of records that have duplicates based on the “Email” column.
In summary, the query is finding and returning the IDs of records in the “Person” table that have duplicate email addresses. The use of ROW_NUMBER()
and filtering for row_num > 1
helps identify and exclude unique records, leaving only the IDs of duplicate records in the final result.
Editorial Solution
Approach: Using DELETE
and WHERE
clause
DELETE p1 FROM person p1,
person p2
WHERE
p1.Email = p2.Email AND p1.Id > p2.Id