PARTITION BY is the primitive underneath every cohort dashboard, every leaderboard, every top-N-per-group report you've ever rendered. It appears in 21% of verified DE interview SQL rounds, and almost all of those pair it with a ranking function. The pattern is so common that most senior interview rubrics call it out as a pass/fail signal for the "can you read a fact table" dimension.
Architecturally, RANK() OVER (PARTITION BY ...) is the pipeline stage that slots between your raw fact grain and your cohort aggregate. Upstream, you have events. Downstream, you have ranked rows per partition key ready for a top-N filter or a deduplication step.
Use PARTITION BY
Pair with ranking
Use RANK directly
Expected at senior
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
RANK() OVER ( PARTITION BY department ORDER BY salary DESC ) AS salary_rank
RANK()
The ranking function. It assigns a rank to each row within its partition based on the ORDER BY clause. If two rows have the same value, they get the same rank, and the next rank is skipped. For example: 1, 2, 2, 4 (rank 3 is skipped because two rows tied at rank 2).
OVER(...)
The window specification. This tells SQL how to define the 'window' of rows for the ranking. Without OVER, RANK() would not know what to rank or how to partition the data.
PARTITION BY column
Divides the result set into independent groups. RANK() restarts at 1 for each partition. Think of it as GROUP BY for window functions, except it does not collapse rows. Every row keeps its identity.
ORDER BY column
Determines the ranking order within each partition. ORDER BY salary DESC means the highest salary gets rank 1. Without ORDER BY, the rank would be meaningless because there is no ordering to rank by.
The four examples below trace the same pipeline stage at increasing levels of sophistication: a basic rank per partition, a top-N filter sitting downstream of the rank, a deduplication pattern on a natural key, and the aggregate-then-rank shape that powers most cohort dashboards. Treat each one as a reusable component in your query architecture.
The most common use case. You want to know who earns the most in each department, without collapsing the data into a single row per department.
SELECT
department,
name,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees;Output:
department | name | salary | salary_rank ------------|---------|---------|------------ Engineering | Alice | 180000 | 1 Engineering | Bob | 165000 | 2 Engineering | Carol | 165000 | 2 Engineering | Dave | 150000 | 4 Marketing | Eve | 140000 | 1 Marketing | Frank | 130000 | 2
Bob and Carol both earn $165K, so they both get rank 2. Dave gets rank 4, not rank 3. The gap in ranks tells you that ties exist above.
This is the most frequently tested RANK OVER PARTITION pattern in interviews. You rank within groups, then filter. It requires a subquery or CTE because you cannot use a window function in WHERE directly.
WITH ranked AS (
SELECT
department,
name,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT department, name, salary
FROM ranked
WHERE rn <= 3;If there are ties at rank 3, RANK() returns all tied rows. That means you might get more than 3 rows per department. If you want exactly 3, use ROW_NUMBER() instead. Interviewers specifically test whether you understand this distinction.
Combine aggregation with ranking. First compute the total spend per customer, then rank them within their region.
WITH customer_spend AS (
SELECT
customer_id,
region,
SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id, region
)
SELECT
region,
customer_id,
total_spend,
RANK() OVER (
PARTITION BY region
ORDER BY total_spend DESC
) AS spend_rank
FROM customer_spend;The CTE computes total_spend with GROUP BY. The outer query applies RANK() to the aggregated result. This two-step pattern (aggregate, then rank) is extremely common.
Another top-N pattern, but with N=1 and ordered by date instead of a numeric value. Useful for deduplication and 'most recent' queries.
WITH latest AS (
SELECT
customer_id,
order_id,
order_date,
amount,
RANK() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS rn
FROM orders
)
SELECT customer_id, order_id, order_date, amount
FROM latest
WHERE rn = 1;If a customer has two orders on the same date, RANK() returns both. Use ROW_NUMBER() if you want exactly one row per customer regardless of ties.
The three ranking functions differ only in how they handle ties. Knowing which one to use is a common interview differentiator.
Always assigns unique sequential numbers. No ties. 1, 2, 3, 4, 5.
Best for: Deduplication, pagination, exactly-N-per-group queries. When you need exactly one row or exactly N rows, no exceptions.
Assigns the same rank to tied rows. Skips subsequent ranks. 1, 2, 2, 4.
Best for: Competition-style ranking where ties should be visible and the gap communicates how many items tied. Sports rankings, leaderboards.
Assigns the same rank to tied rows. Does NOT skip subsequent ranks. 1, 2, 2, 3.
Best for: When you need consecutive rank numbers without gaps. Useful for bucketing or when the rank number itself is used in downstream logic.
Three questions that test RANK OVER PARTITION understanding. Try solving each one before reading the approach.
DENSE_RANK is the right choice here, not RANK. With RANK, if two employees tie for first place, rank 2 is skipped. With DENSE_RANK, the next distinct salary gets rank 2.
WITH ranked AS (
SELECT
department,
name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dr
FROM employees
)
SELECT department, name, salary
FROM ranked
WHERE dr = 2;RANK() handles this naturally because it assigns the same rank to tied rows. Filter WHERE rank <= 2 and you get all products at rank 1 and rank 2, including ties.
WITH product_ranks AS (
SELECT
category,
product_name,
revenue,
RANK() OVER (
PARTITION BY category
ORDER BY revenue DESC
) AS rn
FROM products
)
SELECT category, product_name, revenue
FROM product_ranks
WHERE rn <= 2;Use DENSE_RANK for consecutive numbering, then CASE WHEN to map ranks to labels. PARTITION BY customer, ORDER BY date.
SELECT
customer_id,
order_id,
order_date,
CASE
WHEN DENSE_RANK() OVER (
PARTITION BY customer_id ORDER BY order_date
) = 1 THEN 'first'
WHEN DENSE_RANK() OVER (
PARTITION BY customer_id ORDER BY order_date
) = 2 THEN 'second'
ELSE 'other'
END AS order_label
FROM orders;PARTITION BY shows up in 1 out of every 5 senior SQL rounds and in every cohort dashboard you'll ever ship. Treat it as a primitive in your query architecture.