You're going to reach for a ranking function on at least one question per interview loop. In our corpus ROW_NUMBER shows up in 15% of rounds and RANK or DENSE_RANK in another 11%, which means roughly one in four SQL questions leans on these three functions.
The good news is they're cousins. Same OVER clause, same PARTITION BY shape, one small difference at the tie boundary. Once you see the difference, you'll pick the right one every time. Take this slowly the first pass. The second pass you'll wonder why you ever found it tricky.
Rounds with ROW_NUMBER
RANK / DENSE_RANK
PARTITION BY frequency
SQL questions lean on ranking
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
Here's the thing to hold onto. All three number the rows. They only disagree when two rows tie on the ORDER BY. Watch what the two rows scoring 90 do below and you'll have the whole mental model in one glance. That's the trick you'll carry into your interview.
Same data, different functions
| Score | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 95 | 1 | 1 | 1 |
| 90 | 2 | 2 | 2 |
| 90 | 3 | 2 | 2 |
| 85 | 4 | 4 | 3 |
| 80 | 5 | 5 | 4 |
Key insight: ROW_NUMBER always increments. RANK skips from 2 to 4 because two rows consumed positions 2 and 3. DENSE_RANK goes from 2 to 3 because it only tracks distinct positions, not consumed slots.
SELECT
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rnk
FROM exam_results;Picking the wrong one is a silent bug. Your query runs, returns results, and the interviewer waits to see if you notice the difference.
Assigns a unique integer to every row. No ties, no gaps. When two rows have the same ORDER BY value, the assignment is nondeterministic unless you add a tiebreaker column. This is the most tested ranking function in DE interviews at 6.2% frequency.
Primary use case: Deduplication. Keep only the latest record per entity.
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY updated_at DESC
) AS rn
FROM user_profiles
)
SELECT * FROM deduped WHERE rn = 1;Interview note: If the interviewer says 'get one row per group,' reach for ROW_NUMBER first. It guarantees exactly one row even with ties. RANK would return multiple rows for ties, which is usually not what the question wants.
Assigns the same number to tied rows, then skips ahead. Two rows tied at position 2 means the next row gets position 4, not 3. The gap signals that positions were 'consumed' by the tie. Think of it like a race: two runners cross the finish line together in 2nd place, so there is no 3rd place.
Primary use case: Competition-style leaderboards where ties share a position.
SELECT
employee_id,
department,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees;Interview note: RANK is the right choice when the question explicitly mentions ties or uses phrases like 'ranked by salary.' The gaps after ties can affect downstream filtering: WHERE salary_rank <= 3 might return 4 rows if two people tie for 2nd.
Same as RANK for ties, but no gaps. Two rows tied at 2 means the next row gets 3. The sequence stays compact: 1, 2, 2, 3, 4. Use DENSE_RANK when you need exactly N distinct rank values regardless of how many rows share each rank.
Primary use case: Top-N distinct values. 'Find the 3 highest salaries' where multiple people can share a salary.
SELECT DISTINCT salary
FROM (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees
) ranked
WHERE dr <= 3;Interview note: The classic interview phrasing is 'find the Nth highest salary.' DENSE_RANK is the textbook answer because it handles duplicates correctly. ROW_NUMBER would skip duplicate salary values. RANK would work but the gap makes the WHERE clause unintuitive.
Four patterns cover the vast majority of ranking questions. Learn the shape of each, and you can adapt them to whatever table the interviewer puts in front of you.
The most frequently tested ranking pattern. Use ROW_NUMBER inside a CTE, then filter WHERE rn <= N. PARTITION BY defines the group; ORDER BY defines the ranking criteria. This handles deduplication, top customers per region, most recent orders per user, and dozens of other variations.
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY revenue DESC
) AS rn
FROM customers
)
SELECT * FROM ranked WHERE rn <= 5;Source tables often contain duplicate rows from retries, late-arriving data, or CDC replication. ROW_NUMBER partitioned by the natural key and ordered by a freshness column (updated_at DESC, event_timestamp DESC) assigns 1 to the most recent version. Filter to rn = 1 and you have a clean, deduplicated table. This is the standard ELT dedup pattern.
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY order_id
ORDER BY ingested_at DESC
) AS rn
FROM raw_orders
)
SELECT * FROM deduped WHERE rn = 1;NTILE(N) splits the result set into N roughly equal groups. NTILE(4) creates quartiles, NTILE(100) creates percentiles. Rows are assigned bucket numbers from 1 to N based on the ORDER BY. If the row count is not evenly divisible, earlier buckets get one extra row. This is useful for customer segmentation, salary band analysis, and A/B test group assignment.
SELECT
customer_id,
lifetime_value,
NTILE(4) OVER (ORDER BY lifetime_value DESC) AS quartile
FROM customers;
-- quartile 1 = top 25% spendersAssign ROW_NUMBER ordered by date, then subtract it from the actual date or sequence value. Consecutive rows produce the same difference; gaps produce a different one. GROUP BY that difference to identify islands of consecutive data. This islands-and-gaps technique shows up in time-series interview questions: finding streaks, identifying missing dates, or detecting session boundaries.
WITH numbered AS (
SELECT
event_date,
ROW_NUMBER() OVER (ORDER BY event_date) AS rn
FROM daily_logins
WHERE user_id = 42
)
SELECT
MIN(event_date) AS streak_start,
MAX(event_date) AS streak_end,
COUNT(*) AS streak_length
FROM numbered
GROUP BY event_date - INTERVAL '1 day' * rn
ORDER BY streak_start;PARTITION BY resets the counter for each group. Without it, RANK runs across the entire result set. With it, each partition gets its own independent ranking starting at 1. This is the mechanism behind every "per group" ranking query.
One global ranking across all departments. The highest salary overall gets rank 1.
SELECT
department, employee_name, salary,
RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;
-- Result: rank runs 1 through N across all rows
-- Engineering and Marketing employees share one sequenceSeparate ranking per department. Each department's highest salary gets rank 1.
SELECT
department, employee_name, salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- Result: Engineering has ranks 1, 2, 3...
-- Marketing has its own 1, 2, 3...
-- Each partition is independentYou can partition by more than one column. PARTITION BY department, fiscal_year creates a separate ranking for each department-year combination. 12 departments over 3 years produces 36 independent rankings.
SELECT
department, fiscal_year, employee_name, sales_total,
ROW_NUMBER() OVER (
PARTITION BY department, fiscal_year
ORDER BY sales_total DESC
) AS yearly_rank
FROM annual_sales;These range from the standard top-N per group to advanced islands-and-gaps. Each includes what the interviewer is actually testing and the recommended approach.
What they test:
This is the canonical top-N per group problem. The interviewer wants a CTE with ROW_NUMBER (or RANK if they want ties included), PARTITION BY department, ORDER BY salary DESC. Then filter WHERE rn <= 3 in the outer query.
Approach:
Use ROW_NUMBER if the question says 'top 3 employees.' Use RANK or DENSE_RANK if the question says 'top 3 salaries' and ties should be included. Always clarify before writing.
What they test:
Conceptual understanding. The interviewer wants to hear about gaps vs no gaps and the practical implications. A weak answer just states the definition. A strong answer gives an example with tied values and explains when the gap matters (leaderboards) vs when it does not (finding the Nth value).
Approach:
Draw a quick example: scores 95, 90, 90, 85. RANK gives 1, 2, 2, 4. DENSE_RANK gives 1, 2, 2, 3. Use RANK for 'what position is this row in.' Use DENSE_RANK for 'how many distinct values are above this row.'
What they test:
Deduplication is one of the top real-world uses of ROW_NUMBER. The interviewer checks that you know to PARTITION BY the natural key, ORDER BY the freshness indicator DESC, and filter to rn = 1. Bonus points for wrapping it in a CTE and explaining idempotency.
Approach:
ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY updated_at DESC) AS rn, then WHERE rn = 1. If the table has no reliable freshness column, discuss alternatives: ingestion timestamp, file name, or a composite tiebreaker.
What they test:
This is the DENSE_RANK variant of top-N. If you use ROW_NUMBER, you get exactly one row and might miss ties. DENSE_RANK ensures that if two people share the highest salary, the second-highest is still correctly identified as rank 2, not rank 3.
Approach:
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr, then WHERE dr = 2. Explain why RANK would also give rank 2 here but DENSE_RANK is cleaner because the numbering stays sequential.
What they test:
Islands-and-gaps with ranking. The interviewer wants to see ROW_NUMBER combined with LAG or date arithmetic to identify consecutive sequences. This combines ranking with analytical thinking.
Approach:
First use LAG to flag days where close > previous close. Then use ROW_NUMBER to identify streaks of consecutive TRUE flags. The islands-and-gaps technique: subtract the row number from the date to create a group identifier for each streak.
What they test:
Advanced ranking application. This requires sorting by start_date, using LAG to detect overlaps, then creating groups with a cumulative SUM of non-overlapping flags. It tests whether you can chain window functions to solve a multi-step problem.
Approach:
ORDER BY user_id, start_date. Use LAG(end_date) OVER (PARTITION BY user_id ORDER BY start_date) to get the previous end. Flag new groups where start_date > previous end. Cumulative SUM of the flag creates group IDs. Then GROUP BY the group ID and take MIN(start_date), MAX(end_date).
Three mistakes account for most ranking-related bugs in interviews. Each one produces a query that runs without errors but returns wrong results.
RANK, DENSE_RANK, and ROW_NUMBER all require ORDER BY. Without it, the ranking is undefined. Some engines throw an error; others silently assign arbitrary numbers. PostgreSQL requires ORDER BY for ranking functions. SQL Server allows it but returns nondeterministic results. In an interview, omitting ORDER BY signals that you do not understand how ranking works.
Fix: Always include ORDER BY inside the OVER clause. If the ranking truly should be arbitrary, say so explicitly and use ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) to document the intent.
RANK assigns the same number to tied rows. WHERE rank = 1 returns multiple rows if there is a tie. If the question asks for 'one row per group,' RANK gives incorrect results. ROW_NUMBER guarantees uniqueness because it breaks ties arbitrarily. This is the most common ranking function mix-up in interviews, and interviewers set this trap deliberately.
Fix: Use ROW_NUMBER for exactly-one-per-group. Use RANK or DENSE_RANK only when ties should produce multiple rows. If you need deterministic tiebreaking with ROW_NUMBER, add a secondary ORDER BY column.
NULLs sort to the top or bottom depending on the engine. PostgreSQL puts NULLs last by default for ASC, first for DESC. SQL Server puts NULLs first for ASC. If your ranking column has NULLs, rows with NULL get a rank assignment that may surprise you. In a dedup query, a NULL updated_at could surface as rn = 1, giving you the wrong 'most recent' row.
Fix: Use NULLS LAST or NULLS FIRST explicitly: ORDER BY updated_at DESC NULLS LAST. Or filter NULLs before ranking with a CTE. In engines that do not support NULLS LAST, use COALESCE(updated_at, '1900-01-01').
The difference between knowing these functions and using them under a ticking clock is just practice reps. Grab a problem, write the query, watch it run. That's it.