SQL PARTITION BY Explained
Here's the thing about PARTITION BY. You're going to meet it in 21% of verified DE SQL interview questions, and in almost every senior-level loop we've scraped. The good news is the mental model is tiny. Split the rows into bins, compute a window function inside each bin, keep every original row in the output. That's it. That's the whole clause.
PARTITION BY Syntax
Know Window Functions the way the interviewer who asks it knows it.
-- General syntax
function_name() OVER (
PARTITION BY column1, column2, ...
ORDER BY column3
ROWS BETWEEN ... AND ...
)
-- Example: running total per customer
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;| Feature | PARTITION BY | GROUP BY |
|---|---|---|
| Output rows | Keeps all original rows | Collapses to one row per group |
| Aggregation | Adds computed column alongside detail | Returns only aggregated values |
| Window functions | Required for per-group window scoping | Cannot be used with window functions |
| WHERE filtering | Applied before the window calculation | HAVING filters after aggregation |
| Multiple functions | Each function can use a different PARTITION BY | One GROUP BY applies to all aggregates |
PARTITION BY vs GROUP BY
-- With PARTITION BY (preserves rows)
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Returns every employee row
-- dept_avg is the same value for all rows in the same departmentTop Selling Items
Revenue crowns the winners. Who sold the most?
Pulled from debriefs where SQL was the gate.
-- With GROUP BY + JOIN (collapses, then rejoins)
SELECT
e.department,
e.employee_name,
e.salary,
d.dept_avg
FROM employees e
JOIN (
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
) d ON e.department = d.department;
-- Same result, more code, extra join
-- PARTITION BY does this in a single pass5 PARTITION BY Patterns
Ranking per Group
PARTITION BY defines where the ranking restarts. Without it, ROW_NUMBER counts across the entire result set. With PARTITION BY department, each department gets its own sequence starting at 1. This is the foundation of every top-N per group query, which accounts for roughly 30% of SQL interview questions in data engineering rounds.
SELECT
department, employee_name, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- Engineering: 1, 2, 3, 4...
-- Marketing: 1, 2, 3, 4...
-- Each partition is independentRunning Totals per Group
SUM with PARTITION BY and an ORDER BY clause produces a running total that resets for each group. The frame defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present. This pattern powers cumulative revenue tracking, balance calculations, and progressive aggregation in pipeline output tables.
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;Comparing to Group Aggregates
PARTITION BY without ORDER BY computes a single aggregate value for the entire group and attaches it to every row. This lets you compare each row to its group average, max, or total without a self-join or subquery. The query stays flat and readable.
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;LAG and LEAD per Group
LAG and LEAD look at adjacent rows, but only within the partition. PARTITION BY customer_id with LAG means you compare each order to that customer's previous order, not the previous row in the table. Without the partition, LAG would cross customer boundaries and return meaningless comparisons.
SELECT
customer_id,
order_date,
amount,
LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS prev_amount,
amount - LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS change
FROM orders;Multiple Partition Keys
You can partition by more than one column. PARTITION BY region, product_category creates a separate window for each region-category combination. Ten regions and five categories produce fifty independent windows. Each window function call can use a different PARTITION BY clause in the same SELECT statement.
SELECT
region,
product_category,
month,
revenue,
SUM(revenue) OVER (
PARTITION BY region, product_category
ORDER BY month
) AS cumulative_rev,
RANK() OVER (
PARTITION BY region
ORDER BY revenue DESC
) AS region_rank
FROM monthly_sales;PARTITION BY and Frame Clauses
-- 3-row moving average per store
SELECT
store_id,
sale_date,
daily_revenue,
AVG(daily_revenue) OVER (
PARTITION BY store_id
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM daily_sales;
-- Without ORDER BY: frame is the entire partition (grand avg)
-- With ORDER BY, no frame: default is UNBOUNDED PRECEDING to CURRENT ROW
-- With explicit ROWS: exactly the rows you specify-- ROWS vs RANGE with date gaps
-- Day 1, Day 2, Day 5 (days 3 and 4 missing)
-- ROWS BETWEEN 2 PRECEDING: Day 5 sees Day 2 and Day 1
-- RANGE BETWEEN INTERVAL '2 days' PRECEDING: Day 5 sees only itself
-- (Day 3 and Day 4 do not exist)Common PARTITION BY Mistakes
5 PARTITION BY Interview Questions
Q1: For each department, find the employee whose salary is closest to the department average.
What they test: This tests whether you can use PARTITION BY to compute a group-level aggregate (AVG) alongside row-level data, then use that computed column for filtering. The candidate needs to combine a window function with ABS() and another ranking pass to pick the closest row. Approach: First CTE: compute AVG(salary) OVER (PARTITION BY department) as dept_avg and ABS(salary - dept_avg) as distance. Second CTE: ROW_NUMBER() OVER (PARTITION BY department ORDER BY distance ASC) as rn. Outer query: WHERE rn = 1.
Q2: Calculate a 7-day moving average of daily revenue, partitioned by store location.
What they test: Moving averages combine PARTITION BY with an explicit frame clause. The interviewer checks that you know ROWS BETWEEN 6 PRECEDING AND CURRENT ROW (not RANGE, which behaves differently with date gaps). Partitioning by location ensures each store has its own independent average. Approach: AVG(revenue) OVER (PARTITION BY store_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). Emphasize ROWS vs RANGE: ROWS counts physical rows regardless of date gaps; RANGE would include all rows with dates in the range, which can produce unexpected results if days are missing.
Q3: Write a query that shows each order alongside the previous order amount for the same customer and the percentage change.
What they test: LAG with PARTITION BY. The interviewer wants to verify that you partition by customer so LAG does not cross customer boundaries. The percentage change calculation tests basic arithmetic: (current - previous) / previous * 100. NULL handling for the first order matters too. Approach: LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount. Then (amount - prev_amount) / NULLIF(prev_amount, 0) * 100 AS pct_change. NULLIF guards against division by zero. The first order per customer will have NULL for prev_amount, which propagates correctly.
Q4: Explain the difference between PARTITION BY and GROUP BY. When would you use one over the other?
What they test: Conceptual understanding. A weak answer says 'they both group data.' A strong answer explains that GROUP BY collapses rows while PARTITION BY preserves them. GROUP BY returns one row per group; PARTITION BY adds a computed column to every original row. You cannot reference non-aggregated columns in a GROUP BY select list, but PARTITION BY has no such restriction because it does not collapse anything. Approach: Give a concrete example: 'If I need total department salary, GROUP BY returns one row per department. If I need each employee row with the department total next to it, PARTITION BY keeps all rows.' Mention that GROUP BY is for final output aggregation; PARTITION BY is for per-row enrichment.
Q5: Given a table of daily stock prices, compute the cumulative maximum closing price per ticker symbol.
What they test: MAX as a window function with PARTITION BY and ORDER BY. This produces a running max that only increases or stays flat. The interviewer checks that you know aggregate functions can be used as window functions, and that PARTITION BY ticker ensures each stock has its own independent running max. Approach: MAX(close_price) OVER (PARTITION BY ticker ORDER BY trade_date) AS cumulative_max. The default frame with ORDER BY is UNBOUNDED PRECEDING to CURRENT ROW, which gives the running max. No explicit frame clause needed, but stating it shows awareness.
Confusing PARTITION BY with GROUP BY
GROUP BY collapses rows. PARTITION BY does not. If you write GROUP BY department and SELECT employee_name, you get an error because the query does not know which employee name to return. PARTITION BY department in a window function keeps every row and just scopes the calculation to each department independently. Mixing up these two concepts in an interview signals a fundamental gap in SQL understanding.
Fix: Use GROUP BY when your final output needs one row per group. Use PARTITION BY when you need per-row results with group-level context.
Forgetting ORDER BY changes the frame default
Without ORDER BY, the default frame for SUM/AVG/etc. is the entire partition. With ORDER BY, the default shrinks to UNBOUNDED PRECEDING to CURRENT ROW. This means SUM(amount) OVER (PARTITION BY customer_id) gives the total, but SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives a running total. Many candidates add ORDER BY for 'cleanliness' without realizing it changes the computation entirely.
Fix: Only add ORDER BY inside the window when you want a running or cumulative calculation. For a static group-level aggregate attached to every row, omit ORDER BY.
Using PARTITION BY with too many columns
Partitioning by a high-cardinality combination (like user_id, timestamp) can create partitions with a single row each, making the window function meaningless. RANK over a single-row partition always returns 1. SUM over a single-row partition returns the row's own value. The query runs without errors but the results carry no analytical value.
Fix: Before writing the query, ask: how many rows do I expect per partition? If the answer is consistently one, the partition key is too specific. Remove columns from the PARTITION BY until each group has enough rows for the window function to be meaningful.
Frequently asked questions
What does PARTITION BY do in SQL?+
What is the difference between PARTITION BY and GROUP BY?+
Can I use multiple columns in PARTITION BY?+
Is PARTITION BY required for window functions?+
You've Got This. Now Put It in Your Fingers.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition