SQL PARTITION BY Explained

Here's the thing about PARTITION BY. You're going to meet it in 21% of verified DE SQL interview questions, and in almost every senior-level loop we've scraped. The good news is the mental model is tiny. Split the rows into bins, compute a window function inside each bin, keep every original row in the output. That's it. That's the whole clause.

21%

Questions using PARTITION BY

61%

L5 senior loops in corpus

15%

ROW_NUMBER companion

Patterns you'll reuse forever

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Prepare for the interview

01 / Open invite

02min.

Know Window Functions the way the interviewer who asks it knows it.

a Window Functions query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1SELECT user_id,

2 COUNT(*) AS sessions

3FROM events

4WHERE ts >= NOW() - INTERVAL '7 day'

Execute your solution0.4s avg.

LyftInterview question

Solve a Window Functions problem

PARTITION BY Syntax

You'll always find PARTITION BY tucked inside the OVER clause, never floating on its own. Pass it one or more columns and it carves the result set into bins along those values. Every row sharing the same partition key gets computed together, and the computation resets at each boundary. That reset is the thing to hold in your head.

-- General syntax
function_name() OVER (
  PARTITION BY column1, column2, ...
  ORDER BY column3
  ROWS BETWEEN ... AND ...
)

-- Example: running total per customer
SELECT
  user_id,
  transaction_date,
  total_amount,
  SUM(total_amount) OVER (
    PARTITION BY user_id
    ORDER BY transaction_date
  ) AS running_total
FROM transactions;

The PARTITION BY clause is optional. Without it, the entire result set is treated as one partition. ORDER BY inside OVER is also optional, but its presence changes the default frame for aggregate window functions. Both of these defaults trip up candidates in interviews regularly. Interview note: If an interviewer asks you to compute a department-level average next to each employee row, they want PARTITION BY. If they ask for one row per department with the average, they want GROUP BY. Recognizing which one the question calls for is half the battle.

Platform Speed

> The product team is investigating whether device type affects how long users stay engaged. For each device type, show the typical session length, the longest session on record, and how many sessions were logged. Only include device types that have at least five sessions in the data , anything with fewer isn't statistically meaningful. List from the device type with the longest average session down to the shortest.

PARTITION BY vs GROUP BY

This is one of the most frequently asked conceptual questions in SQL interviews. Both create groups. The difference is in what happens to the rows after grouping.

Feature	PARTITION BY	GROUP BY
Output rows	Keeps all original rows	Collapses to one row per group
Aggregation	Adds computed column alongside detail	Returns only aggregated values
Window functions	Required for per-group window scoping	Cannot be used with window functions
WHERE filtering	Applied before the window calculation	HAVING filters after aggregation
Multiple functions	Each function can use a different PARTITION BY	One GROUP BY applies to all aggregates

Here is the same question answered both ways. The goal: show each employee alongside their department average salary.

-- With PARTITION BY (preserves rows)
SELECT
  department,
  emp_name,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

-- Returns every employee row
-- dept_avg is the same value for all rows in the same department

-- With GROUP BY + JOIN (collapses, then rejoins)
SELECT
  e.department,
  e.emp_name,
  e.salary,
  d.dept_avg
FROM employees e
JOIN (
  SELECT department, AVG(salary) AS dept_avg
  FROM employees
  GROUP BY department
) d ON e.department = d.department;

-- Same result, more code, extra join
-- PARTITION BY does this in a single pass

5 PARTITION BY Patterns

These five patterns cover the vast majority of PARTITION BY usage in interviews and production SQL. Each shows a different window function type and explains how PARTITION BY scopes it.

Ranking per Group

PARTITION BY defines where the ranking restarts. Without it, ROW_NUMBER counts across the entire result set. With PARTITION BY department, each department gets its own sequence starting at 1. This is the foundation of every top-N per group query, which accounts for roughly 30% of SQL interview questions in data engineering rounds.

SELECT
  department, emp_name, salary,
  ROW_NUMBER() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS dept_rank
FROM employees;

-- Engineering: 1, 2, 3, 4...
-- Marketing:   1, 2, 3, 4...
-- Each partition is independent

Running Totals per Group

SUM with PARTITION BY and an ORDER BY clause produces a running total that resets for each group. The frame defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present. This pattern powers cumulative revenue tracking, balance calculations, and progressive aggregation in pipeline output tables.

SELECT
  user_id,
  transaction_date,
  total_amount,
  SUM(total_amount) OVER (
    PARTITION BY user_id
    ORDER BY transaction_date
  ) AS running_total
FROM transactions;

Comparing to Group Aggregates

PARTITION BY without ORDER BY computes a single aggregate value for the entire group and attaches it to every row. This lets you compare each row to its group average, max, or total without a self-join or subquery. The query stays flat and readable.

SELECT
  department,
  emp_name,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg,
  salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;

LAG and LEAD per Group

LAG and LEAD look at adjacent rows, but only within the partition. PARTITION BY customer_id with LAG means you compare each order to that customer's previous order, not the previous row in the table. Without the partition, LAG would cross customer boundaries and return meaningless comparisons.

SELECT
  user_id,
  transaction_date,
  total_amount,
  LAG(total_amount) OVER (
    PARTITION BY user_id
    ORDER BY transaction_date
  ) AS prev_amount,
  total_amount - LAG(total_amount) OVER (
    PARTITION BY user_id
    ORDER BY transaction_date
  ) AS change
FROM transactions;

Multiple Partition Keys

You can partition by more than one column. PARTITION BY region, product_category creates a separate window for each region-category combination. Ten regions and five categories produce fifty independent windows. Each window function call can use a different PARTITION BY clause in the same SELECT statement.

SELECT
  p.category,
  substr(t.transaction_date, 1, 7) AS month,
  t.total_amount AS revenue,
  SUM(t.total_amount) OVER (
    PARTITION BY p.category, substr(t.transaction_date, 1, 7)
    ORDER BY t.transaction_date
  ) AS cumulative_rev,
  RANK() OVER (
    PARTITION BY p.category
    ORDER BY t.total_amount DESC
  ) AS category_rank
FROM transactions t
JOIN products p ON p.product_id = t.product_id;

PARTITION BY and Frame Clauses

The frame clause (ROWS BETWEEN or RANGE BETWEEN) controls which rows within a partition contribute to the calculation. PARTITION BY defines the outer boundary; the frame defines the inner boundary. Understanding their interaction is critical for moving averages, cumulative sums, and bounded lookbacks.

-- 3-row moving average per product
SELECT
  product_id,
  transaction_date,
  total_amount,
  AVG(total_amount) OVER (
    PARTITION BY product_id
    ORDER BY transaction_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_avg_3
FROM transactions;

-- Without ORDER BY: frame is the entire partition (grand avg)
-- With ORDER BY, no frame: default is UNBOUNDED PRECEDING to CURRENT ROW
-- With explicit ROWS: exactly the rows you specify

Interview note: A common trick question: 'What is the difference between SUM(x) OVER (PARTITION BY y) and SUM(x) OVER (PARTITION BY y ORDER BY z)?' The first gives the group total on every row. The second gives a running total. The ORDER BY implicitly changes the frame. If you can explain this, you demonstrate real window function depth. ROWS counts physical rows. RANGE counts logical value ranges. With dates, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW grabs the previous 6 rows by position, which might not be 6 calendar days if data has gaps. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW grabs all rows within 6 calendar days regardless of how many rows that represents. PostgreSQL supports both; MySQL 8+ supports ROWS but not RANGE with intervals.

-- ROWS frame: each row plus the 2 physical rows before it,
-- regardless of date gaps. Compare the windowed sum to the
-- running total to see how the frame boundary changes the result.
SELECT
  user_id,
  transaction_date,
  total_amount,
  SUM(total_amount) OVER (
    PARTITION BY user_id
    ORDER BY transaction_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS sum_last_3_rows
FROM transactions
ORDER BY user_id, transaction_date
LIMIT 20;

5 PARTITION BY Interview Questions

Q1: For each department, find the employee whose salary is closest to the department average.

What they test: This tests whether you can use PARTITION BY to compute a group-level aggregate (AVG) alongside row-level data, then use that computed column for filtering. The candidate needs to combine a window function with ABS() and another ranking pass to pick the closest row. Approach: First CTE: compute AVG(salary) OVER (PARTITION BY department) as dept_avg and ABS(salary - dept_avg) as distance. Second CTE: ROW_NUMBER() OVER (PARTITION BY department ORDER BY distance ASC) as rn. Outer query: WHERE rn = 1.

Q2: Calculate a 7-day moving average of daily revenue, partitioned by store location.

What they test: Moving averages combine PARTITION BY with an explicit frame clause. The interviewer checks that you know ROWS BETWEEN 6 PRECEDING AND CURRENT ROW (not RANGE, which behaves differently with date gaps). Partitioning by location ensures each store has its own independent average. Approach: AVG(revenue) OVER (PARTITION BY store_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). Emphasize ROWS vs RANGE: ROWS counts physical rows regardless of date gaps; RANGE would include all rows with dates in the range, which can produce unexpected results if days are missing.

Q3: Write a query that shows each order alongside the previous order amount for the same customer and the percentage change.

What they test: LAG with PARTITION BY. The interviewer wants to verify that you partition by customer so LAG does not cross customer boundaries. The percentage change calculation tests basic arithmetic: (current - previous) / previous * 100. NULL handling for the first order matters too. Approach: LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount. Then (amount - prev_amount) / NULLIF(prev_amount, 0) * 100 AS pct_change. NULLIF guards against division by zero. The first order per customer will have NULL for prev_amount, which propagates correctly.

Q4: Explain the difference between PARTITION BY and GROUP BY. When would you use one over the other?

What they test: Conceptual understanding. A weak answer says 'they both group data.' A strong answer explains that GROUP BY collapses rows while PARTITION BY preserves them. GROUP BY returns one row per group; PARTITION BY adds a computed column to every original row. You cannot reference non-aggregated columns in a GROUP BY select list, but PARTITION BY has no such restriction because it does not collapse anything. Approach: Give a concrete example: 'If I need total department salary, GROUP BY returns one row per department. If I need each employee row with the department total next to it, PARTITION BY keeps all rows.' Mention that GROUP BY is for final output aggregation; PARTITION BY is for per-row enrichment.

Q5: Given a table of daily stock prices, compute the cumulative maximum closing price per ticker symbol.

What they test: MAX as a window function with PARTITION BY and ORDER BY. This produces a running max that only increases or stays flat. The interviewer checks that you know aggregate functions can be used as window functions, and that PARTITION BY ticker ensures each stock has its own independent running max. Approach: MAX(close_price) OVER (PARTITION BY ticker ORDER BY trade_date) AS cumulative_max. The default frame with ORDER BY is UNBOUNDED PRECEDING to CURRENT ROW, which gives the running max. No explicit frame clause needed, but stating it shows awareness.

Common PARTITION BY Mistakes

Three mistakes account for most PARTITION BY bugs in interviews. Each produces a query that compiles and runs but returns wrong results.

Confusing PARTITION BY with GROUP BY

GROUP BY collapses rows. PARTITION BY does not. If you write GROUP BY department and SELECT employee_name, you get an error because the query does not know which employee name to return. PARTITION BY department in a window function keeps every row and just scopes the calculation to each department independently. Mixing up these two concepts in an interview signals a fundamental gap in SQL understanding.

Fix: Use GROUP BY when your final output needs one row per group. Use PARTITION BY when you need per-row results with group-level context.

Forgetting ORDER BY changes the frame default

Without ORDER BY, the default frame for SUM/AVG/etc. is the entire partition. With ORDER BY, the default shrinks to UNBOUNDED PRECEDING to CURRENT ROW. This means SUM(amount) OVER (PARTITION BY customer_id) gives the total, but SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives a running total. Many candidates add ORDER BY for 'cleanliness' without realizing it changes the computation entirely.

Fix: Only add ORDER BY inside the window when you want a running or cumulative calculation. For a static group-level aggregate attached to every row, omit ORDER BY.

Using PARTITION BY with too many columns

Partitioning by a high-cardinality combination (like user_id, timestamp) can create partitions with a single row each, making the window function meaningless. RANK over a single-row partition always returns 1. SUM over a single-row partition returns the row's own value. The query runs without errors but the results carry no analytical value.

Fix: Before writing the query, ask: how many rows do I expect per partition? If the answer is consistently one, the partition key is too specific. Remove columns from the PARTITION BY until each group has enough rows for the window function to be meaningful.

Frequently asked questions

What does PARTITION BY do in SQL?+

PARTITION BY divides a result set into groups (partitions) for window function calculations. Each partition is processed independently. Unlike GROUP BY, PARTITION BY does not collapse rows. Every original row stays in the output, with the window function result added as an extra column. You can think of it as GROUP BY that preserves detail.

What is the difference between PARTITION BY and GROUP BY?+

GROUP BY reduces multiple rows into one row per group and requires all non-aggregated columns in the GROUP BY list. PARTITION BY keeps all rows and scopes a window function to each group independently. Use GROUP BY for summary reports with one row per category. Use PARTITION BY when you need each detail row enriched with group-level calculations like running totals, rankings, or averages.

Can I use multiple columns in PARTITION BY?+

Yes. PARTITION BY region, product_type creates one partition for each unique combination of region and product type. The window function resets for each combination. You can also use different PARTITION BY clauses in different window functions within the same SELECT statement. For example, one column could rank within department while another computes a company-wide running total.

Is PARTITION BY required for window functions?+

No. If you omit PARTITION BY, the window function treats the entire result set as one partition. RANK() OVER (ORDER BY salary DESC) ranks all rows together. SUM(amount) OVER () returns the grand total on every row. PARTITION BY is only needed when you want the function to reset per group.

02 / Why practice

You've Got This. Now Put It in Your Fingers.

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Try a Window Function Problem