SQL Practice

SQL RANK vs DENSE_RANK vs ROW_NUMBER

You're going to reach for a ranking function on at least one question per interview loop. In our corpus ROW_NUMBER shows up in 15% of rounds and RANK or DENSE_RANK in another 11%, which means roughly one in four SQL questions leans on these three functions.

The good news is they're cousins. Same OVER clause, same PARTITION BY shape, one small difference at the tie boundary. Once you see the difference, you'll pick the right one every time. Take this slowly the first pass. The second pass you'll wonder why you ever found it tricky.

15%

Rounds with ROW_NUMBER

11%

RANK / DENSE_RANK

21%

PARTITION BY frequency

1 in 4

SQL questions lean on ranking

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

ROW_NUMBER vs RANK vs DENSE_RANK

Here's the thing to hold onto. All three number the rows. They only disagree when two rows tie on the ORDER BY. Watch what the two rows scoring 90 do below and you'll have the whole mental model in one glance. That's the trick you'll carry into your interview.

Same data, different functions

ScoreROW_NUMBERRANKDENSE_RANK
95111
90222
90322
85443
80554

Key insight: ROW_NUMBER always increments. RANK skips from 2 to 4 because two rows consumed positions 2 and 3. DENSE_RANK goes from 2 to 3 because it only tracks distinct positions, not consumed slots.

SELECT
  score,
  ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
  RANK()       OVER (ORDER BY score DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rnk
FROM exam_results;

When to Use Each Function

Picking the wrong one is a silent bug. Your query runs, returns results, and the interviewer waits to see if you notice the difference.

ROW_NUMBER()

Assigns a unique integer to every row. No ties, no gaps. When two rows have the same ORDER BY value, the assignment is nondeterministic unless you add a tiebreaker column. This is the most tested ranking function in DE interviews at 6.2% frequency.

Primary use case: Deduplication. Keep only the latest record per entity.

WITH deduped AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY updated_at DESC
    ) AS rn
  FROM user_profiles
)
SELECT * FROM deduped WHERE rn = 1;

Interview note: If the interviewer says 'get one row per group,' reach for ROW_NUMBER first. It guarantees exactly one row even with ties. RANK would return multiple rows for ties, which is usually not what the question wants.

RANK()

Assigns the same number to tied rows, then skips ahead. Two rows tied at position 2 means the next row gets position 4, not 3. The gap signals that positions were 'consumed' by the tie. Think of it like a race: two runners cross the finish line together in 2nd place, so there is no 3rd place.

Primary use case: Competition-style leaderboards where ties share a position.

SELECT
  employee_id,
  department,
  salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS salary_rank
FROM employees;

Interview note: RANK is the right choice when the question explicitly mentions ties or uses phrases like 'ranked by salary.' The gaps after ties can affect downstream filtering: WHERE salary_rank <= 3 might return 4 rows if two people tie for 2nd.

DENSE_RANK()

Same as RANK for ties, but no gaps. Two rows tied at 2 means the next row gets 3. The sequence stays compact: 1, 2, 2, 3, 4. Use DENSE_RANK when you need exactly N distinct rank values regardless of how many rows share each rank.

Primary use case: Top-N distinct values. 'Find the 3 highest salaries' where multiple people can share a salary.

SELECT DISTINCT salary
FROM (
  SELECT
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
  FROM employees
) ranked
WHERE dr <= 3;

Interview note: The classic interview phrasing is 'find the Nth highest salary.' DENSE_RANK is the textbook answer because it handles duplicates correctly. ROW_NUMBER would skip duplicate salary values. RANK would work but the gap makes the WHERE clause unintuitive.

Ranking Patterns for Interviews

Four patterns cover the vast majority of ranking questions. Learn the shape of each, and you can adapt them to whatever table the interviewer puts in front of you.

Top-N Per Group

The most frequently tested ranking pattern. Use ROW_NUMBER inside a CTE, then filter WHERE rn <= N. PARTITION BY defines the group; ORDER BY defines the ranking criteria. This handles deduplication, top customers per region, most recent orders per user, and dozens of other variations.

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY region
      ORDER BY revenue DESC
    ) AS rn
  FROM customers
)
SELECT * FROM ranked WHERE rn <= 5;

Deduplication with ROW_NUMBER

Source tables often contain duplicate rows from retries, late-arriving data, or CDC replication. ROW_NUMBER partitioned by the natural key and ordered by a freshness column (updated_at DESC, event_timestamp DESC) assigns 1 to the most recent version. Filter to rn = 1 and you have a clean, deduplicated table. This is the standard ELT dedup pattern.

WITH deduped AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY order_id
      ORDER BY ingested_at DESC
    ) AS rn
  FROM raw_orders
)
SELECT * FROM deduped WHERE rn = 1;

Percentile Buckets with NTILE

NTILE(N) splits the result set into N roughly equal groups. NTILE(4) creates quartiles, NTILE(100) creates percentiles. Rows are assigned bucket numbers from 1 to N based on the ORDER BY. If the row count is not evenly divisible, earlier buckets get one extra row. This is useful for customer segmentation, salary band analysis, and A/B test group assignment.

SELECT
  customer_id,
  lifetime_value,
  NTILE(4) OVER (ORDER BY lifetime_value DESC) AS quartile
FROM customers;
-- quartile 1 = top 25% spenders

Gap Detection

Assign ROW_NUMBER ordered by date, then subtract it from the actual date or sequence value. Consecutive rows produce the same difference; gaps produce a different one. GROUP BY that difference to identify islands of consecutive data. This islands-and-gaps technique shows up in time-series interview questions: finding streaks, identifying missing dates, or detecting session boundaries.

WITH numbered AS (
  SELECT
    event_date,
    ROW_NUMBER() OVER (ORDER BY event_date) AS rn
  FROM daily_logins
  WHERE user_id = 42
)
SELECT
  MIN(event_date) AS streak_start,
  MAX(event_date) AS streak_end,
  COUNT(*) AS streak_length
FROM numbered
GROUP BY event_date - INTERVAL '1 day' * rn
ORDER BY streak_start;

PARTITION BY with Ranking Functions

PARTITION BY resets the counter for each group. Without it, RANK runs across the entire result set. With it, each partition gets its own independent ranking starting at 1. This is the mechanism behind every "per group" ranking query.

Without PARTITION BY

One global ranking across all departments. The highest salary overall gets rank 1.

SELECT
  department, employee_name, salary,
  RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM employees;

-- Result: rank runs 1 through N across all rows
-- Engineering and Marketing employees share one sequence

With PARTITION BY department

Separate ranking per department. Each department's highest salary gets rank 1.

SELECT
  department, employee_name, salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS dept_rank
FROM employees;

-- Result: Engineering has ranks 1, 2, 3...
--         Marketing has its own 1, 2, 3...
--         Each partition is independent

Multiple PARTITION BY columns

You can partition by more than one column. PARTITION BY department, fiscal_year creates a separate ranking for each department-year combination. 12 departments over 3 years produces 36 independent rankings.

SELECT
  department, fiscal_year, employee_name, sales_total,
  ROW_NUMBER() OVER (
    PARTITION BY department, fiscal_year
    ORDER BY sales_total DESC
  ) AS yearly_rank
FROM annual_sales;

6 Ranking Interview Questions

These range from the standard top-N per group to advanced islands-and-gaps. Each includes what the interviewer is actually testing and the recommended approach.

Q1: Given an employees table, find the top 3 highest-paid employees in each department.

What they test:

This is the canonical top-N per group problem. The interviewer wants a CTE with ROW_NUMBER (or RANK if they want ties included), PARTITION BY department, ORDER BY salary DESC. Then filter WHERE rn <= 3 in the outer query.

Approach:

Use ROW_NUMBER if the question says 'top 3 employees.' Use RANK or DENSE_RANK if the question says 'top 3 salaries' and ties should be included. Always clarify before writing.

Q2: What is the difference between RANK and DENSE_RANK? When would you use each?

What they test:

Conceptual understanding. The interviewer wants to hear about gaps vs no gaps and the practical implications. A weak answer just states the definition. A strong answer gives an example with tied values and explains when the gap matters (leaderboards) vs when it does not (finding the Nth value).

Approach:

Draw a quick example: scores 95, 90, 90, 85. RANK gives 1, 2, 2, 4. DENSE_RANK gives 1, 2, 2, 3. Use RANK for 'what position is this row in.' Use DENSE_RANK for 'how many distinct values are above this row.'

Q3: Remove duplicate records from a table, keeping only the most recent version of each entity.

What they test:

Deduplication is one of the top real-world uses of ROW_NUMBER. The interviewer checks that you know to PARTITION BY the natural key, ORDER BY the freshness indicator DESC, and filter to rn = 1. Bonus points for wrapping it in a CTE and explaining idempotency.

Approach:

ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY updated_at DESC) AS rn, then WHERE rn = 1. If the table has no reliable freshness column, discuss alternatives: ingestion timestamp, file name, or a composite tiebreaker.

Q4: Find the second highest salary in each department. Handle ties.

What they test:

This is the DENSE_RANK variant of top-N. If you use ROW_NUMBER, you get exactly one row and might miss ties. DENSE_RANK ensures that if two people share the highest salary, the second-highest is still correctly identified as rank 2, not rank 3.

Approach:

DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr, then WHERE dr = 2. Explain why RANK would also give rank 2 here but DENSE_RANK is cleaner because the numbering stays sequential.

Q5: Given a table of daily stock prices, find the longest consecutive streak where the closing price increased day over day.

What they test:

Islands-and-gaps with ranking. The interviewer wants to see ROW_NUMBER combined with LAG or date arithmetic to identify consecutive sequences. This combines ranking with analytical thinking.

Approach:

First use LAG to flag days where close > previous close. Then use ROW_NUMBER to identify streaks of consecutive TRUE flags. The islands-and-gaps technique: subtract the row number from the date to create a group identifier for each streak.

Q6: You have a table with overlapping date ranges per user. Merge overlapping ranges into consolidated periods.

What they test:

Advanced ranking application. This requires sorting by start_date, using LAG to detect overlaps, then creating groups with a cumulative SUM of non-overlapping flags. It tests whether you can chain window functions to solve a multi-step problem.

Approach:

ORDER BY user_id, start_date. Use LAG(end_date) OVER (PARTITION BY user_id ORDER BY start_date) to get the previous end. Flag new groups where start_date > previous end. Cumulative SUM of the flag creates group IDs. Then GROUP BY the group ID and take MIN(start_date), MAX(end_date).

Common Ranking Mistakes

Three mistakes account for most ranking-related bugs in interviews. Each one produces a query that runs without errors but returns wrong results.

Missing ORDER BY in the OVER clause

RANK, DENSE_RANK, and ROW_NUMBER all require ORDER BY. Without it, the ranking is undefined. Some engines throw an error; others silently assign arbitrary numbers. PostgreSQL requires ORDER BY for ranking functions. SQL Server allows it but returns nondeterministic results. In an interview, omitting ORDER BY signals that you do not understand how ranking works.

Fix: Always include ORDER BY inside the OVER clause. If the ranking truly should be arbitrary, say so explicitly and use ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) to document the intent.

Using RANK when you need exactly one row per group

RANK assigns the same number to tied rows. WHERE rank = 1 returns multiple rows if there is a tie. If the question asks for 'one row per group,' RANK gives incorrect results. ROW_NUMBER guarantees uniqueness because it breaks ties arbitrarily. This is the most common ranking function mix-up in interviews, and interviewers set this trap deliberately.

Fix: Use ROW_NUMBER for exactly-one-per-group. Use RANK or DENSE_RANK only when ties should produce multiple rows. If you need deterministic tiebreaking with ROW_NUMBER, add a secondary ORDER BY column.

Not handling NULLs in the ORDER BY column

NULLs sort to the top or bottom depending on the engine. PostgreSQL puts NULLs last by default for ASC, first for DESC. SQL Server puts NULLs first for ASC. If your ranking column has NULLs, rows with NULL get a rank assignment that may surprise you. In a dedup query, a NULL updated_at could surface as rn = 1, giving you the wrong 'most recent' row.

Fix: Use NULLS LAST or NULLS FIRST explicitly: ORDER BY updated_at DESC NULLS LAST. Or filter NULLs before ranking with a CTE. In engines that do not support NULLS LAST, use COALESCE(updated_at, '1900-01-01').

SQL RANK FAQ

What is the RANK function in SQL?+
RANK() is a window function that assigns a positional number to each row based on the ORDER BY clause. Rows with identical values receive the same rank, and the next rank skips ahead by the number of tied rows. Two rows at rank 2 means the next row gets rank 4. It works in PostgreSQL, MySQL 8+, SQL Server, Snowflake, BigQuery, and every modern engine.
What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?+
ROW_NUMBER gives every row a unique sequential integer with no ties and no gaps. RANK allows ties but leaves gaps after them (1, 2, 2, 4). DENSE_RANK allows ties with no gaps (1, 2, 2, 3). Use ROW_NUMBER for deduplication or exactly-one-per-group. Use RANK for competition-style rankings. Use DENSE_RANK to find the Nth distinct value.
How does PARTITION BY work with RANK?+
PARTITION BY divides the data into independent groups before ranking. RANK() OVER (PARTITION BY department ORDER BY salary DESC) restarts the ranking at 1 for each department. Without PARTITION BY, the ranking runs across the entire result set. Think of PARTITION BY as a GROUP BY for window functions: it defines the boundary within which the function operates.
Can I use RANK without PARTITION BY?+
Yes. Without PARTITION BY, the ranking applies to all rows as a single group. RANK() OVER (ORDER BY score DESC) ranks every row against every other row. PARTITION BY is optional and only needed when you want separate rankings per group.
Which ranking function should I use for deduplication?+
ROW_NUMBER. It guarantees exactly one row per partition gets rn = 1 because there are no ties. RANK and DENSE_RANK can assign rank 1 to multiple rows if their ORDER BY values match. For dedup, you want one winner per group, and ROW_NUMBER delivers that. Partition by the natural key, order by a freshness column (usually a timestamp DESC), and filter to rn = 1.

You'll Thank Yourself at Minute Four

The difference between knowing these functions and using them under a ticking clock is just practice reps. Grab a problem, write the query, watch it run. That's it.