# The Freshest Record > Duplicates everywhere. Only the most recent version of the truth survives. Canonical URL: Domain: SQL · Difficulty: medium · Seniority: L5 ## Problem The observability team is cleaning up server logs ahead of a weekly review. Same server emitting the same message multiple times should appear only once, keeping the most recent occurrence. After that, narrow the result to entries from the last 7 days. Show every column from the surviving rows. ## Worked solution and explanation ### Why this problem exists in real interviews Two operations stacked: deduplicate per (server_name, message) and recency-filter the result. The interviewer is checking that you reach for `ROW_NUMBER` (the portable choice over Postgres-only `DISTINCT ON`) and that you put the recency filter **after** dedup, not before. Order matters: filtering first can change which row wins the partition. > **Trick to Solving** > > Two stages, in this order: > > 1. CTE: `ROW_NUMBER() OVER (PARTITION BY server_name, message ORDER BY log_timestamp DESC)` to label rows; rn=1 is the latest row per group > 2. Outer: `WHERE rn = 1 AND ` to keep only the latest-per-group AND only the last 7 days --- ### Break down the requirements #### Step 1: Number rows within each (server_name, message) partition Build a CTE `ranked` that wraps `server_logs` with `ROW_NUMBER() OVER (PARTITION BY server_name, message ORDER BY log_timestamp DESC) AS rn`. The latest row in each group gets `rn = 1`. #### Step 2: Filter to rn=1 AND last 7 days In the outer SELECT, keep `rn = 1` to drop the older duplicates, then apply the recency filter. SQLite has no `INTERVAL` keyword, so express the 7-day window with `julianday`: `julianday(date('now')) - julianday(log_timestamp) <= 7`. #### Step 3: Project the original columns Project the original 6 columns explicitly (skip `rn`): `log_id, server_name, log_level, message, response_time_ms, log_timestamp`. --- ### The solution **ROW_NUMBER dedup, then SQLite-friendly recency filter** ```sql WITH ranked AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY server_name, message ORDER BY log_timestamp DESC ) AS rn FROM server_logs ) SELECT log_id, server_name, log_level, message, response_time_ms, log_timestamp FROM ranked WHERE rn = 1 AND julianday(date('now')) - julianday(log_timestamp) <= 7 ``` > **Time and Space Complexity** > > **Time:** O(n log n) for the in-partition sort across 80M rows. The hash partitioner builds groups of (server_name, message), and within each, the sort is the dominant step. A composite index on (server_name, message, log_timestamp DESC) would reduce the sort to a single descending scan. > > **Space:** O(g) for the partition state where g is the number of distinct (server_name, message) pairs. > **Interviewers Watch For** > > Strong candidates dedup first, filter second. They also reach for `julianday` (or the engine's equivalent date arithmetic) instead of Postgres-only `INTERVAL`, since the grader runs on SQLite. > **Common Pitfall** > > Writing `WHERE log_timestamp >= CURRENT_DATE - INTERVAL '7 days'`. That's Postgres syntax and the SQLite grader will reject it. Use `julianday(date('now')) - julianday(log_timestamp) <= 7` or `log_timestamp >= date('now', '-7 days')`. The other trap: applying that filter inside the CTE instead of the outer SELECT, which can change which row wins the partition. --- ## Common follow-up questions - If you applied the 7-day filter inside the CTE, what could go wrong with the dedup result? _(Tests understanding of how partition contents shift when you filter before vs after dedup.)_ - How would you write this with `DISTINCT ON (server_name, message)` on Postgres? _(Tests Postgres-specific dedup syntax as a more concise alternative.)_ - What changes if you needed to keep the earliest row per group instead of the latest? _(Tests changing the rank ordering for the inverse problem.)_ ## Related - [All practice problems](https://datadriven.io/problems) - [Mock interview mode](https://datadriven.io/interview/deduplicate_and_keep_latest) - [SQL Interview Questions](https://datadriven.io/sql-interview-questions) - [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep) - [Daily Challenge](https://datadriven.io/daily) --- Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.