Here's the thing about PARTITION BY. You're going to meet it in 21% of verified DE SQL interview questions, and in almost every senior-level loop we've scraped. The good news is the mental model is tiny. Split the rows into bins, compute a window function inside each bin, keep every original row in the output. That's it. That's the whole clause.
Take this slowly on the first read. If GROUP BY collapses, think of PARTITION BY as the one that annotates. You'll notice the pattern repeating across ranking, running totals, and cohort-style comparisons. Once you see it once, you'll see it everywhere, and interviewers love candidates who can name the pattern instead of freshly deriving it each time.
Questions using PARTITION BY
L5 senior loops in corpus
ROW_NUMBER companion
Patterns you'll reuse forever
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
You'll always find PARTITION BY tucked inside the OVER clause, never floating on its own. Pass it one or more columns and it carves the result set into bins along those values. Every row sharing the same partition key gets computed together, and the computation resets at each boundary. That reset is the thing to hold in your head.
-- General syntax
function_name() OVER (
PARTITION BY column1, column2, ...
ORDER BY column3
ROWS BETWEEN ... AND ...
)
-- Example: running total per customer
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;The PARTITION BY clause is optional. Without it, the entire result set is treated as one partition. ORDER BY inside OVER is also optional, but its presence changes the default frame for aggregate window functions. Both of these defaults trip up candidates in interviews regularly.
Interview note: If an interviewer asks you to compute a department-level average next to each employee row, they want PARTITION BY. If they ask for one row per department with the average, they want GROUP BY. Recognizing which one the question calls for is half the battle.
This is one of the most frequently asked conceptual questions in SQL interviews. Both create groups. The difference is in what happens to the rows after grouping.
| Feature | PARTITION BY | GROUP BY |
|---|---|---|
| Output rows | Keeps all original rows | Collapses to one row per group |
| Aggregation | Adds computed column alongside detail | Returns only aggregated values |
| Window functions | Required for per-group window scoping | Cannot be used with window functions |
| WHERE filtering | Applied before the window calculation | HAVING filters after aggregation |
| Multiple functions | Each function can use a different PARTITION BY | One GROUP BY applies to all aggregates |
Here is the same question answered both ways. The goal: show each employee alongside their department average salary.
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Returns every employee row
-- dept_avg is the same value for all rows in the same departmentSELECT
e.department,
e.employee_name,
e.salary,
d.dept_avg
FROM employees e
JOIN (
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
) d ON e.department = d.department;
-- Same result, more code, extra join
-- PARTITION BY does this in a single passThese five patterns cover the vast majority of PARTITION BY usage in interviews and production SQL. Each shows a different window function type and explains how PARTITION BY scopes it.
PARTITION BY defines where the ranking restarts. Without it, ROW_NUMBER counts across the entire result set. With PARTITION BY department, each department gets its own sequence starting at 1. This is the foundation of every top-N per group query, which accounts for roughly 30% of SQL interview questions in data engineering rounds.
SELECT
department, employee_name, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- Engineering: 1, 2, 3, 4...
-- Marketing: 1, 2, 3, 4...
-- Each partition is independentSUM with PARTITION BY and an ORDER BY clause produces a running total that resets for each group. The frame defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present. This pattern powers cumulative revenue tracking, balance calculations, and progressive aggregation in pipeline output tables.
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;PARTITION BY without ORDER BY computes a single aggregate value for the entire group and attaches it to every row. This lets you compare each row to its group average, max, or total without a self-join or subquery. The query stays flat and readable.
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;LAG and LEAD look at adjacent rows, but only within the partition. PARTITION BY customer_id with LAG means you compare each order to that customer's previous order, not the previous row in the table. Without the partition, LAG would cross customer boundaries and return meaningless comparisons.
SELECT
customer_id,
order_date,
amount,
LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS prev_amount,
amount - LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS change
FROM orders;You can partition by more than one column. PARTITION BY region, product_category creates a separate window for each region-category combination. Ten regions and five categories produce fifty independent windows. Each window function call can use a different PARTITION BY clause in the same SELECT statement.
SELECT
region,
product_category,
month,
revenue,
SUM(revenue) OVER (
PARTITION BY region, product_category
ORDER BY month
) AS cumulative_rev,
RANK() OVER (
PARTITION BY region
ORDER BY revenue DESC
) AS region_rank
FROM monthly_sales;The frame clause (ROWS BETWEEN or RANGE BETWEEN) controls which rows within a partition contribute to the calculation. PARTITION BY defines the outer boundary; the frame defines the inner boundary. Understanding their interaction is critical for moving averages, cumulative sums, and bounded lookbacks.
-- 3-row moving average per store
SELECT
store_id,
sale_date,
daily_revenue,
AVG(daily_revenue) OVER (
PARTITION BY store_id
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM daily_sales;
-- Without ORDER BY: frame is the entire partition (grand avg)
-- With ORDER BY, no frame: default is UNBOUNDED PRECEDING to CURRENT ROW
-- With explicit ROWS: exactly the rows you specifyInterview note: A common trick question: "What is the difference between SUM(x) OVER (PARTITION BY y) and SUM(x) OVER (PARTITION BY y ORDER BY z)?" The first gives the group total on every row. The second gives a running total. The ORDER BY implicitly changes the frame. If you can explain this, you demonstrate real window function depth.
ROWS counts physical rows. RANGE counts logical value ranges. With dates, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW grabs the previous 6 rows by position, which might not be 6 calendar days if data has gaps. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW grabs all rows within 6 calendar days regardless of how many rows that represents. PostgreSQL supports both; MySQL 8+ supports ROWS but not RANGE with intervals.
-- ROWS vs RANGE with date gaps
-- Day 1, Day 2, Day 5 (days 3 and 4 missing)
-- ROWS BETWEEN 2 PRECEDING: Day 5 sees Day 2 and Day 1
-- RANGE BETWEEN INTERVAL '2 days' PRECEDING: Day 5 sees only itself
-- (Day 3 and Day 4 do not exist)These questions test different aspects of PARTITION BY: basic scoping, frame interaction, multi-column partitions, and the GROUP BY comparison. Each includes the interviewer's intent and a recommended approach.
What they test:
This tests whether you can use PARTITION BY to compute a group-level aggregate (AVG) alongside row-level data, then use that computed column for filtering. The candidate needs to combine a window function with ABS() and another ranking pass to pick the closest row.
Approach:
First CTE: compute AVG(salary) OVER (PARTITION BY department) as dept_avg and ABS(salary - dept_avg) as distance. Second CTE: ROW_NUMBER() OVER (PARTITION BY department ORDER BY distance ASC) as rn. Outer query: WHERE rn = 1.
What they test:
Moving averages combine PARTITION BY with an explicit frame clause. The interviewer checks that you know ROWS BETWEEN 6 PRECEDING AND CURRENT ROW (not RANGE, which behaves differently with date gaps). Partitioning by location ensures each store has its own independent average.
Approach:
AVG(revenue) OVER (PARTITION BY store_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). Emphasize ROWS vs RANGE: ROWS counts physical rows regardless of date gaps; RANGE would include all rows with dates in the range, which can produce unexpected results if days are missing.
What they test:
LAG with PARTITION BY. The interviewer wants to verify that you partition by customer so LAG does not cross customer boundaries. The percentage change calculation tests basic arithmetic: (current - previous) / previous * 100. NULL handling for the first order matters too.
Approach:
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount. Then (amount - prev_amount) / NULLIF(prev_amount, 0) * 100 AS pct_change. NULLIF guards against division by zero. The first order per customer will have NULL for prev_amount, which propagates correctly.
What they test:
Conceptual understanding. A weak answer says 'they both group data.' A strong answer explains that GROUP BY collapses rows while PARTITION BY preserves them. GROUP BY returns one row per group; PARTITION BY adds a computed column to every original row. You cannot reference non-aggregated columns in a GROUP BY select list, but PARTITION BY has no such restriction because it does not collapse anything.
Approach:
Give a concrete example: 'If I need total department salary, GROUP BY returns one row per department. If I need each employee row with the department total next to it, PARTITION BY keeps all rows.' Mention that GROUP BY is for final output aggregation; PARTITION BY is for per-row enrichment.
What they test:
MAX as a window function with PARTITION BY and ORDER BY. This produces a running max that only increases or stays flat. The interviewer checks that you know aggregate functions can be used as window functions, and that PARTITION BY ticker ensures each stock has its own independent running max.
Approach:
MAX(close_price) OVER (PARTITION BY ticker ORDER BY trade_date) AS cumulative_max. The default frame with ORDER BY is UNBOUNDED PRECEDING to CURRENT ROW, which gives the running max. No explicit frame clause needed, but stating it shows awareness.
Three mistakes account for most PARTITION BY bugs in interviews. Each produces a query that compiles and runs but returns wrong results.
GROUP BY collapses rows. PARTITION BY does not. If you write GROUP BY department and SELECT employee_name, you get an error because the query does not know which employee name to return. PARTITION BY department in a window function keeps every row and just scopes the calculation to each department independently. Mixing up these two concepts in an interview signals a fundamental gap in SQL understanding.
Fix: Use GROUP BY when your final output needs one row per group. Use PARTITION BY when you need per-row results with group-level context.
Without ORDER BY, the default frame for SUM/AVG/etc. is the entire partition. With ORDER BY, the default shrinks to UNBOUNDED PRECEDING to CURRENT ROW. This means SUM(amount) OVER (PARTITION BY customer_id) gives the total, but SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives a running total. Many candidates add ORDER BY for 'cleanliness' without realizing it changes the computation entirely.
Fix: Only add ORDER BY inside the window when you want a running or cumulative calculation. For a static group-level aggregate attached to every row, omit ORDER BY.
Partitioning by a high-cardinality combination (like user_id, timestamp) can create partitions with a single row each, making the window function meaningless. RANK over a single-row partition always returns 1. SUM over a single-row partition returns the row's own value. The query runs without errors but the results carry no analytical value.
Fix: Before writing the query, ask: how many rows do I expect per partition? If the answer is consistently one, the partition key is too specific. Remove columns from the PARTITION BY until each group has enough rows for the window function to be meaningful.
Reading the mental model is half the battle. The other half is typing the OVER clause without looking it up. Grab a problem and write it all the way through.