SQL RANK vs DENSE_RANK vs ROW_NUMBER for Data Engineers

You're going to reach for a ranking function on at least one question per interview loop. In our corpus ROW_NUMBER shows up in 15% of rounds and RANK or DENSE_RANK in another 11%, which means roughly one in four SQL questions leans on these three functions.

15%

Rounds with ROW_NUMBER

11%

RANK / DENSE_RANK

21%

PARTITION BY frequency

1 in 4

SQL questions lean on ranking

ROW_NUMBER vs RANK vs DENSE_RANK

All three number the rows. They only disagree when two rows tie on the ORDER BY. Watch what the two rows scoring 90 do below.

Score	ROW_NUMBER	RANK	DENSE_RANK
95	1	1	1
90	2	2	2
90	3	2	2
85	4	4	3
80	5	5	4

All Three Functions Side by Side

SELECT
  value,
  ROW_NUMBER() OVER (ORDER BY value DESC) AS row_num,
  RANK()       OVER (ORDER BY value DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY value DESC) AS dense_rnk
FROM ab_results;

Key insight: ROW_NUMBER always increments. RANK skips from 2 to 4 because two rows consumed positions 2 and 3. DENSE_RANK goes from 2 to 3 because it only tracks distinct positions, not consumed slots.

When to Use Each Function

Picking the wrong one is a silent bug. Your query runs, returns results, and the interviewer waits to see if you notice the difference.

Best for deduplication

ROW_NUMBER()

Assigns a unique integer to every row. No ties, no gaps. When two rows have the same ORDER BY value, the assignment is nondeterministic unless you add a tiebreaker column. This is the most tested ranking function in DE interviews at 6.2% frequency. Primary use: Keep only the latest record per entity. Interview note: If the interviewer says 'get one row per group,' reach for ROW_NUMBER first. It guarantees exactly one row even with ties.

Best for leaderboards

RANK()

Assigns the same number to tied rows, then skips ahead. Two rows tied at position 2 means the next row gets position 4, not 3. Think of it like a race: two runners cross the finish line together in 2nd place, so there is no 3rd place. Primary use: Competition-style leaderboards where ties share a position. Interview note: RANK is the right choice when the question explicitly mentions ties. WHERE salary_rank <= 3 might return 4 rows if two people tie for 2nd.

Best for Nth value

DENSE_RANK()

Same as RANK for ties, but no gaps. Two rows tied at 2 means the next row gets 3. The sequence stays compact: 1, 2, 2, 3, 4. Primary use: Top-N distinct values. 'Find the 3 highest salaries' where multiple people can share a salary. Interview note: The classic interview phrasing is 'find the Nth highest salary.' DENSE_RANK is the textbook answer because it handles duplicates correctly.

Ranking Patterns for Interviews

Four patterns cover the vast majority of ranking questions. Learn the shape of each, and you can adapt them to whatever table the interviewer puts in front of you.

Top-N Per Group

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY region
      ORDER BY profit DESC
    ) AS rn
  FROM orders
)
SELECT * FROM ranked WHERE rn <= 5;

The most frequently tested ranking pattern. Use ROW_NUMBER inside a CTE, then filter WHERE rn <= N. PARTITION BY defines the group; ORDER BY defines the ranking criteria.

Deduplication with ROW_NUMBER

WITH deduped AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY transaction_date DESC
    ) AS rn
  FROM transactions
)
SELECT * FROM deduped WHERE rn = 1;

Source tables often contain duplicate rows from retries, late-arriving data, or CDC replication. ROW_NUMBER partitioned by the natural key and ordered by a freshness column assigns 1 to the most recent version. This is the standard ELT dedup pattern.

Percentile Buckets with NTILE

SELECT
  user_id,
  total_amount,
  NTILE(4) OVER (ORDER BY total_amount DESC) AS quartile
FROM transactions;
-- quartile 1 = top 25% spenders

NTILE(N) splits the result set into N roughly equal groups. NTILE(4) creates quartiles, NTILE(100) creates percentiles. If the row count is not evenly divisible, earlier buckets get one extra row.

Gap Detection (Islands and Gaps)

WITH numbered AS (
  SELECT
    date(transaction_date) AS event_date,
    ROW_NUMBER() OVER (ORDER BY transaction_date) AS rn
  FROM transactions
  WHERE user_id = 1070
)
SELECT
  MIN(event_date) AS streak_start,
  MAX(event_date) AS streak_end,
  COUNT(*) AS streak_length
FROM numbered
GROUP BY date(event_date, '-' || rn || ' days')
ORDER BY streak_start;

Assign ROW_NUMBER ordered by date, then subtract it from the actual date. Consecutive rows produce the same difference; gaps produce a different one. GROUP BY that difference to identify islands of consecutive data.

PARTITION BY with Ranking Functions

PARTITION BY resets the counter for each group. Without it, RANK runs across the entire result set. With it, each partition gets its own independent ranking starting at 1. This is the mechanism behind every 'per group' ranking query.

Global vs Per-Department Ranking

-- Without PARTITION BY: global ranking
SELECT
  department, emp_name, salary,
  RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;
-- rank runs 1 through N across all rows

-- With PARTITION BY department: per-department ranking
SELECT
  department, emp_name, salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS dept_rank
FROM employees;
-- Engineering has ranks 1, 2, 3...
-- Marketing has its own 1, 2, 3...

-- Multiple PARTITION BY columns
SELECT
  department, fiscal_year, metric_name, metric_value,
  ROW_NUMBER() OVER (
    PARTITION BY department, fiscal_year
    ORDER BY metric_value DESC
  ) AS yearly_rank
FROM employee_metrics;

6 Ranking Interview Questions

These range from the standard top-N per group to advanced islands-and-gaps. Each includes what the interviewer is actually testing and the recommended approach.

Given an employees table, find the top 3 highest-paid employees in each department.

Use ROW_NUMBER if the question says 'top 3 employees.' Use RANK or DENSE_RANK if the question says 'top 3 salaries' and ties should be included. Always clarify before writing.

What is the difference between RANK and DENSE_RANK? When would you use each?

Draw a quick example: scores 95, 90, 90, 85. RANK gives 1, 2, 2, 4. DENSE_RANK gives 1, 2, 2, 3. Use RANK for 'what position is this row in.' Use DENSE_RANK for 'how many distinct values are above this row.'

Remove duplicate records from a table, keeping only the most recent version of each entity.

ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY updated_at DESC) AS rn, then WHERE rn = 1. If the table has no reliable freshness column, discuss alternatives: ingestion timestamp, file name, or a composite tiebreaker.

Find the second highest salary in each department. Handle ties.

DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr, then WHERE dr = 2. DENSE_RANK is cleaner because the numbering stays sequential.

Given a table of daily stock prices, find the longest consecutive streak where the closing price increased day over day.

First use LAG to flag days where close > previous close. Then use ROW_NUMBER to identify streaks of consecutive TRUE flags. The islands-and-gaps technique: subtract the row number from the date to create a group identifier for each streak.

You have a table with overlapping date ranges per user. Merge overlapping ranges into consolidated periods.

ORDER BY user_id, start_date. Use LAG(end_date) OVER (PARTITION BY user_id ORDER BY start_date) to get the previous end. Flag new groups where start_date > previous end. Cumulative SUM of the flag creates group IDs. Then GROUP BY the group ID and take MIN(start_date), MAX(end_date).

Common Ranking Mistakes

Three mistakes account for most ranking-related bugs in interviews. Each one produces a query that runs without errors but returns wrong results.

Syntax Error

Missing ORDER BY in the OVER clause

RANK, DENSE_RANK, and ROW_NUMBER all require ORDER BY. Without it, the ranking is undefined. PostgreSQL requires ORDER BY for ranking functions. Fix: Always include ORDER BY inside the OVER clause. If the ranking truly should be arbitrary, use ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) to document the intent.

Silent Bug

Using RANK when you need exactly one row per group

RANK assigns the same number to tied rows. WHERE rank = 1 returns multiple rows if there is a tie. If the question asks for 'one row per group,' RANK gives incorrect results. ROW_NUMBER guarantees uniqueness. Fix: Use ROW_NUMBER for exactly-one-per-group. Use RANK or DENSE_RANK only when ties should produce multiple rows.

Data Quality Bug

Not handling NULLs in the ORDER BY column

NULLs sort to the top or bottom depending on the engine. PostgreSQL puts NULLs last by default for ASC, first for DESC. If your ranking column has NULLs, rows with NULL get a rank assignment that may surprise you. Fix: Use NULLS LAST or NULLS FIRST explicitly: ORDER BY updated_at DESC NULLS LAST.

SQL RANK FAQ

What is the RANK function in SQL?+

RANK() is a window function that assigns a positional number to each row based on the ORDER BY clause. Rows with identical values receive the same rank, and the next rank skips ahead by the number of tied rows. Two rows at rank 2 means the next row gets rank 4. It works in PostgreSQL, MySQL 8+, SQL Server, Snowflake, BigQuery, and every modern engine.

What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?+

ROW_NUMBER gives every row a unique sequential integer with no ties and no gaps. RANK allows ties but leaves gaps after them (1, 2, 2, 4). DENSE_RANK allows ties with no gaps (1, 2, 2, 3). Use ROW_NUMBER for deduplication or exactly-one-per-group. Use RANK for competition-style rankings. Use DENSE_RANK to find the Nth distinct value.

How does PARTITION BY work with RANK?+

PARTITION BY divides the data into independent groups before ranking. RANK() OVER (PARTITION BY department ORDER BY salary DESC) restarts the ranking at 1 for each department. Without PARTITION BY, the ranking runs across the entire result set. Think of PARTITION BY as a GROUP BY for window functions: it defines the boundary within which the function operates.

Can I use RANK without PARTITION BY?+

Yes. Without PARTITION BY, the ranking applies to all rows as a single group. RANK() OVER (ORDER BY score DESC) ranks every row against every other row. PARTITION BY is optional and only needed when you want separate rankings per group.

Which ranking function should I use for deduplication?+

ROW_NUMBER. It guarantees exactly one row per partition gets rn = 1 because there are no ties. RANK and DENSE_RANK can assign rank 1 to multiple rows if their ORDER BY values match. For dedup, you want one winner per group, and ROW_NUMBER delivers that. Partition by the natural key, order by a freshness column (usually a timestamp DESC), and filter to rn = 1.

02 / Why practice

You'll Thank Yourself at Minute Four

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Try a Ranking Problem

Related Guides

Window Functions Practice→

ROW_NUMBER, LAG, LEAD, running totals, and frame clauses at interview difficulty

PARTITION BY Deep Dive→

How PARTITION BY scopes window functions and resets counters per group

SQL Interview Questions→

Complete guide to every SQL topic tested in DE interviews