SQL RANK vs DENSE_RANK vs ROW_NUMBER for Data Engineers
You're going to reach for a ranking function on at least one question per interview loop. In our corpus ROW_NUMBER shows up in 15% of rounds and RANK or DENSE_RANK in another 11%, which means roughly one in four SQL questions leans on these three functions.
Ranking Patterns for Interviews
Four patterns cover the vast majority of ranking questions. Learn the shape of each, and you can adapt them to whatever table the interviewer puts in front of you.
PARTITION BY with Ranking Functions
PARTITION BY resets the counter for each group. Without it, RANK runs across the entire result set. With it, each partition gets its own independent ranking starting at 1. This is the mechanism behind every 'per group' ranking query.
Know Window Functions the way the interviewer who asks it knows it.
Top Selling Items
Revenue crowns the winners. Who sold the most?
Pulled from debriefs where SQL was the gate.
ROW_NUMBER vs RANK vs DENSE_RANK
All three number the rows. They only disagree when two rows tie on the ORDER BY. Watch what the two rows scoring 90 do below.
| Score | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 95 | 1 | 1 | 1 |
| 90 | 2 | 2 | 2 |
| 90 | 3 | 2 | 2 |
| 85 | 4 | 4 | 3 |
| 80 | 5 | 5 | 4 |
All Three Functions Side by Side
SELECT
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rnk
FROM exam_results;Key insight: ROW_NUMBER always increments. RANK skips from 2 to 4 because two rows consumed positions 2 and 3. DENSE_RANK goes from 2 to 3 because it only tracks distinct positions, not consumed slots.
When to Use Each Function
Picking the wrong one is a silent bug. Your query runs, returns results, and the interviewer waits to see if you notice the difference.
ROW_NUMBER()
Assigns a unique integer to every row. No ties, no gaps. When two rows have the same ORDER BY value, the assignment is nondeterministic unless you add a tiebreaker column. This is the most tested ranking function in DE interviews at 6.2% frequency. Primary use: Keep only the latest record per entity. Interview note: If the interviewer says 'get one row per group,' reach for ROW_NUMBER first. It guarantees exactly one row even with ties.
RANK()
Assigns the same number to tied rows, then skips ahead. Two rows tied at position 2 means the next row gets position 4, not 3. Think of it like a race: two runners cross the finish line together in 2nd place, so there is no 3rd place. Primary use: Competition-style leaderboards where ties share a position. Interview note: RANK is the right choice when the question explicitly mentions ties. WHERE salary_rank <= 3 might return 4 rows if two people tie for 2nd.
DENSE_RANK()
Same as RANK for ties, but no gaps. Two rows tied at 2 means the next row gets 3. The sequence stays compact: 1, 2, 2, 3, 4. Primary use: Top-N distinct values. 'Find the 3 highest salaries' where multiple people can share a salary. Interview note: The classic interview phrasing is 'find the Nth highest salary.' DENSE_RANK is the textbook answer because it handles duplicates correctly.
Top-N Per Group
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY revenue DESC
) AS rn
FROM customers
)
SELECT * FROM ranked WHERE rn <= 5;The most frequently tested ranking pattern. Use ROW_NUMBER inside a CTE, then filter WHERE rn <= N. PARTITION BY defines the group; ORDER BY defines the ranking criteria.
Deduplication with ROW_NUMBER
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY order_id
ORDER BY ingested_at DESC
) AS rn
FROM raw_orders
)
SELECT * FROM deduped WHERE rn = 1;Source tables often contain duplicate rows from retries, late-arriving data, or CDC replication. ROW_NUMBER partitioned by the natural key and ordered by a freshness column assigns 1 to the most recent version. This is the standard ELT dedup pattern.
Percentile Buckets with NTILE
SELECT
customer_id,
lifetime_value,
NTILE(4) OVER (ORDER BY lifetime_value DESC) AS quartile
FROM customers;
-- quartile 1 = top 25% spendersNTILE(N) splits the result set into N roughly equal groups. NTILE(4) creates quartiles, NTILE(100) creates percentiles. If the row count is not evenly divisible, earlier buckets get one extra row.
Gap Detection (Islands and Gaps)
WITH numbered AS (
SELECT
event_date,
ROW_NUMBER() OVER (ORDER BY event_date) AS rn
FROM daily_logins
WHERE user_id = 42
)
SELECT
MIN(event_date) AS streak_start,
MAX(event_date) AS streak_end,
COUNT(*) AS streak_length
FROM numbered
GROUP BY event_date - INTERVAL '1 day' * rn
ORDER BY streak_start;Assign ROW_NUMBER ordered by date, then subtract it from the actual date. Consecutive rows produce the same difference; gaps produce a different one. GROUP BY that difference to identify islands of consecutive data.
Global vs Per-Department Ranking
-- Without PARTITION BY: global ranking
SELECT
department, employee_name, salary,
RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;
-- rank runs 1 through N across all rows
-- With PARTITION BY department: per-department ranking
SELECT
department, employee_name, salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- Engineering has ranks 1, 2, 3...
-- Marketing has its own 1, 2, 3...
-- Multiple PARTITION BY columns
SELECT
department, fiscal_year, employee_name, sales_total,
ROW_NUMBER() OVER (
PARTITION BY department, fiscal_year
ORDER BY sales_total DESC
) AS yearly_rank
FROM annual_sales;6 Ranking Interview Questions
These range from the standard top-N per group to advanced islands-and-gaps. Each includes what the interviewer is actually testing and the recommended approach.
Given an employees table, find the top 3 highest-paid employees in each department.
Use ROW_NUMBER if the question says 'top 3 employees.' Use RANK or DENSE_RANK if the question says 'top 3 salaries' and ties should be included. Always clarify before writing.
What is the difference between RANK and DENSE_RANK? When would you use each?
Draw a quick example: scores 95, 90, 90, 85. RANK gives 1, 2, 2, 4. DENSE_RANK gives 1, 2, 2, 3. Use RANK for 'what position is this row in.' Use DENSE_RANK for 'how many distinct values are above this row.'
Remove duplicate records from a table, keeping only the most recent version of each entity.
ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY updated_at DESC) AS rn, then WHERE rn = 1. If the table has no reliable freshness column, discuss alternatives: ingestion timestamp, file name, or a composite tiebreaker.
Find the second highest salary in each department. Handle ties.
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr, then WHERE dr = 2. DENSE_RANK is cleaner because the numbering stays sequential.
Given a table of daily stock prices, find the longest consecutive streak where the closing price increased day over day.
First use LAG to flag days where close > previous close. Then use ROW_NUMBER to identify streaks of consecutive TRUE flags. The islands-and-gaps technique: subtract the row number from the date to create a group identifier for each streak.
You have a table with overlapping date ranges per user. Merge overlapping ranges into consolidated periods.
ORDER BY user_id, start_date. Use LAG(end_date) OVER (PARTITION BY user_id ORDER BY start_date) to get the previous end. Flag new groups where start_date > previous end. Cumulative SUM of the flag creates group IDs. Then GROUP BY the group ID and take MIN(start_date), MAX(end_date).
Common Ranking Mistakes
Three mistakes account for most ranking-related bugs in interviews. Each one produces a query that runs without errors but returns wrong results.
Missing ORDER BY in the OVER clause
RANK, DENSE_RANK, and ROW_NUMBER all require ORDER BY. Without it, the ranking is undefined. PostgreSQL requires ORDER BY for ranking functions. Fix: Always include ORDER BY inside the OVER clause. If the ranking truly should be arbitrary, use ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) to document the intent.
Using RANK when you need exactly one row per group
RANK assigns the same number to tied rows. WHERE rank = 1 returns multiple rows if there is a tie. If the question asks for 'one row per group,' RANK gives incorrect results. ROW_NUMBER guarantees uniqueness. Fix: Use ROW_NUMBER for exactly-one-per-group. Use RANK or DENSE_RANK only when ties should produce multiple rows.
Not handling NULLs in the ORDER BY column
NULLs sort to the top or bottom depending on the engine. PostgreSQL puts NULLs last by default for ASC, first for DESC. If your ranking column has NULLs, rows with NULL get a rank assignment that may surprise you. Fix: Use NULLS LAST or NULLS FIRST explicitly: ORDER BY updated_at DESC NULLS LAST.
SQL RANK FAQ
What is the RANK function in SQL?+
What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?+
How does PARTITION BY work with RANK?+
Can I use RANK without PARTITION BY?+
Which ranking function should I use for deduplication?+
You'll Thank Yourself at Minute Four
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition