Loading section...

How Do You Handle Duplicates?

Deduplication Patterns The interviewer asks 'how do you handle duplicates' to test two things: do you know the ROW_NUMBER pattern cold, and can you think beyond exact-match dedup? The trap is only talking about ROW_NUMBER. The senior answer covers four strategies and explains when you'd pick each one. Start with ROW_NUMBER - it's the universal dedup tool and the interviewer expects to see you write it from memory. Assign a sequence number within each group of duplicates, keep only row 1. The key decisions the interviewer will probe: what columns define a duplicate, and which row wins? Always specify ORDER BY to make the winner deterministic. The follow-up will be: 'What about streaming?' For streaming pipelines, you can't query the full table. Your answer should describe a dedup window: ho