SQL Window Functions

SQL RANK() OVER (PARTITION BY)

PARTITION BY is the primitive underneath every cohort dashboard, every leaderboard, every top-N-per-group report you've ever rendered. It appears in 21% of verified DE interview SQL rounds, and almost all of those pair it with a ranking function. The pattern is so common that most senior interview rubrics call it out as a pass/fail signal for the "can you read a fact table" dimension.

Architecturally, RANK() OVER (PARTITION BY ...) is the pipeline stage that slots between your raw fact grain and your cohort aggregate. Upstream, you have events. Downstream, you have ranked rows per partition key ready for a top-N filter or a deduplication step.

21%

Use PARTITION BY

~all

Pair with ranking

11%

Use RANK directly

L5+

Expected at senior

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Syntax Breakdown

RANK() OVER (
  PARTITION BY department
  ORDER BY salary DESC
) AS salary_rank

RANK()

The ranking function. It assigns a rank to each row within its partition based on the ORDER BY clause. If two rows have the same value, they get the same rank, and the next rank is skipped. For example: 1, 2, 2, 4 (rank 3 is skipped because two rows tied at rank 2).

OVER(...)

The window specification. This tells SQL how to define the 'window' of rows for the ranking. Without OVER, RANK() would not know what to rank or how to partition the data.

PARTITION BY column

Divides the result set into independent groups. RANK() restarts at 1 for each partition. Think of it as GROUP BY for window functions, except it does not collapse rows. Every row keeps its identity.

ORDER BY column

Determines the ranking order within each partition. ORDER BY salary DESC means the highest salary gets rank 1. Without ORDER BY, the rank would be meaningless because there is no ordering to rank by.

Examples

The four examples below trace the same pipeline stage at increasing levels of sophistication: a basic rank per partition, a top-N filter sitting downstream of the rank, a deduplication pattern on a natural key, and the aggregate-then-rank shape that powers most cohort dashboards. Treat each one as a reusable component in your query architecture.

Basic: Rank employees by salary within each department

The most common use case. You want to know who earns the most in each department, without collapsing the data into a single row per department.

SELECT
  department,
  name,
  salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS salary_rank
FROM employees;

Output:

department  | name    | salary  | salary_rank
------------|---------|---------|------------
Engineering | Alice   | 180000  | 1
Engineering | Bob     | 165000  | 2
Engineering | Carol   | 165000  | 2
Engineering | Dave    | 150000  | 4
Marketing   | Eve     | 140000  | 1
Marketing   | Frank   | 130000  | 2

Bob and Carol both earn $165K, so they both get rank 2. Dave gets rank 4, not rank 3. The gap in ranks tells you that ties exist above.

Top-N per group: Find the top 3 earners in each department

This is the most frequently tested RANK OVER PARTITION pattern in interviews. You rank within groups, then filter. It requires a subquery or CTE because you cannot use a window function in WHERE directly.

WITH ranked AS (
  SELECT
    department,
    name,
    salary,
    RANK() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS rn
  FROM employees
)
SELECT department, name, salary
FROM ranked
WHERE rn <= 3;

If there are ties at rank 3, RANK() returns all tied rows. That means you might get more than 3 rows per department. If you want exactly 3, use ROW_NUMBER() instead. Interviewers specifically test whether you understand this distinction.

Rank customers by total spend

Combine aggregation with ranking. First compute the total spend per customer, then rank them within their region.

WITH customer_spend AS (
  SELECT
    customer_id,
    region,
    SUM(amount) AS total_spend
  FROM orders
  GROUP BY customer_id, region
)
SELECT
  region,
  customer_id,
  total_spend,
  RANK() OVER (
    PARTITION BY region
    ORDER BY total_spend DESC
  ) AS spend_rank
FROM customer_spend;

The CTE computes total_spend with GROUP BY. The outer query applies RANK() to the aggregated result. This two-step pattern (aggregate, then rank) is extremely common.

Find the most recent order per customer

Another top-N pattern, but with N=1 and ordered by date instead of a numeric value. Useful for deduplication and 'most recent' queries.

WITH latest AS (
  SELECT
    customer_id,
    order_id,
    order_date,
    amount,
    RANK() OVER (
      PARTITION BY customer_id
      ORDER BY order_date DESC
    ) AS rn
  FROM orders
)
SELECT customer_id, order_id, order_date, amount
FROM latest
WHERE rn = 1;

If a customer has two orders on the same date, RANK() returns both. Use ROW_NUMBER() if you want exactly one row per customer regardless of ties.

RANK vs ROW_NUMBER vs DENSE_RANK

The three ranking functions differ only in how they handle ties. Knowing which one to use is a common interview differentiator.

ROW_NUMBER()

Always assigns unique sequential numbers. No ties. 1, 2, 3, 4, 5.

Best for: Deduplication, pagination, exactly-N-per-group queries. When you need exactly one row or exactly N rows, no exceptions.

Example: 1, 2, 3, 4, 5 (even if rows 2 and 3 have the same value)

RANK()

Assigns the same rank to tied rows. Skips subsequent ranks. 1, 2, 2, 4.

Best for: Competition-style ranking where ties should be visible and the gap communicates how many items tied. Sports rankings, leaderboards.

Example: 1, 2, 2, 4 (rank 3 is skipped)

DENSE_RANK()

Assigns the same rank to tied rows. Does NOT skip subsequent ranks. 1, 2, 2, 3.

Best for: When you need consecutive rank numbers without gaps. Useful for bucketing or when the rank number itself is used in downstream logic.

Example: 1, 2, 2, 3 (no gaps)

Interview Questions

Three questions that test RANK OVER PARTITION understanding. Try solving each one before reading the approach.

For each department, find the employee with the second-highest salary. If there is a tie for the highest salary, the second-highest is the next distinct salary below the tie.+

DENSE_RANK is the right choice here, not RANK. With RANK, if two employees tie for first place, rank 2 is skipped. With DENSE_RANK, the next distinct salary gets rank 2.

WITH ranked AS (
  SELECT
    department,
    name,
    salary,
    DENSE_RANK() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS dr
  FROM employees
)
SELECT department, name, salary
FROM ranked
WHERE dr = 2;
Write a query that returns the top 2 products by revenue in each category. If there are ties, include all tied products.+

RANK() handles this naturally because it assigns the same rank to tied rows. Filter WHERE rank <= 2 and you get all products at rank 1 and rank 2, including ties.

WITH product_ranks AS (
  SELECT
    category,
    product_name,
    revenue,
    RANK() OVER (
      PARTITION BY category
      ORDER BY revenue DESC
    ) AS rn
  FROM products
)
SELECT category, product_name, revenue
FROM product_ranks
WHERE rn <= 2;
For each customer, flag their orders as 'first', 'second', or 'other' based on order date. If two orders share the same date, both should be flagged with the same label.+

Use DENSE_RANK for consecutive numbering, then CASE WHEN to map ranks to labels. PARTITION BY customer, ORDER BY date.

SELECT
  customer_id,
  order_id,
  order_date,
  CASE
    WHEN DENSE_RANK() OVER (
      PARTITION BY customer_id ORDER BY order_date
    ) = 1 THEN 'first'
    WHEN DENSE_RANK() OVER (
      PARTITION BY customer_id ORDER BY order_date
    ) = 2 THEN 'second'
    ELSE 'other'
  END AS order_label
FROM orders;

FAQ

Can I use RANK() without PARTITION BY?+
Yes. Without PARTITION BY, RANK() treats the entire result set as one partition. It ranks all rows against each other instead of ranking within groups. This is valid but less common in interviews. Most interview questions specifically test the PARTITION BY component because it requires understanding grouped ranking.
Why can I not use RANK() in a WHERE clause directly?+
Window functions are evaluated after WHERE, HAVING, and GROUP BY. The rank does not exist yet when the WHERE clause runs. You must compute the rank in a subquery or CTE first, then filter on it in the outer query. This is why the CTE pattern (WITH ranked AS ... SELECT ... WHERE rn <= N) is so common.
When should I use RANK() vs ROW_NUMBER() in an interview?+
Use ROW_NUMBER() when you need exactly N rows per group, with no exceptions for ties. Use RANK() when ties should be preserved and the rank gap should communicate the tie. If the interviewer says 'top 3' and two items tie for third place, ask whether they want exactly 3 rows or all items ranked 3 or above. That clarifying question itself demonstrates strong SQL understanding.

Build it once, reuse it everywhere

PARTITION BY shows up in 1 out of every 5 senior SQL rounds and in every cohort dashboard you'll ever ship. Treat it as a primitive in your query architecture.