SQL Reference

SQL Window Functions

Most candidates think window functions are a ranking trick. The interviewer is checking whether you understand that PARTITION BY doesn't reduce rows. That single confusion tanks more window function questions than syntax errors do. The data backs it up: PARTITION BY shows up in 21% of verified DE SQL rounds, ROW_NUMBER in 15%, RANK and DENSE_RANK in 11%, aggregate windows in 9%.

Most study guides dump a function list and call it a day. We're going to flip the order. Start with what the OVER clause actually does to row cardinality, then the frame semantics candidates always skip, then the six functions worth memorizing. Every example runs on PostgreSQL and copies cleanly to Snowflake, BigQuery, Redshift, and Databricks.

21%

Questions with PARTITION BY

15%

ROW_NUMBER appearances

11%

RANK / DENSE_RANK

9%

Aggregate window functions

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Window Function Anatomy

Here's where candidates trip. The OVER clause doesn't aggregate rows, it annotates them. Every input row comes out intact with extra columns attached. If you think PARTITION BY acts like GROUP BY, you'll write a query that returns the wrong cardinality and the interviewer will flag it in 30 seconds.

SELECT
  employee_id,
  department,
  salary,
  hire_date,

  -- Full anatomy: function + OVER(PARTITION BY + ORDER BY + frame)
  SUM(salary) OVER (
    PARTITION BY department          -- 1. Split rows into groups
    ORDER BY hire_date               -- 2. Sort within each group
    ROWS BETWEEN UNBOUNDED PRECEDING -- 3. Define which rows
         AND CURRENT ROW             --    the function sees
  ) AS running_dept_salary

FROM employees;

PARTITION BY

Divides the result set into independent groups. The window function resets for each partition. Think of it as GROUP BY without collapsing rows. If you omit PARTITION BY, the entire result set is one partition.

Interview tip: "PARTITION BY is like GROUP BY that keeps every row." That one sentence shows you understand the core concept.

ORDER BY

Determines the sequence of rows within each partition. Required for ranking functions (ROW_NUMBER, RANK, DENSE_RANK) and offset functions (LAG, LEAD). For aggregate windows (SUM, AVG, COUNT), ORDER BY turns the aggregate into a running calculation instead of a partition-wide total.

Without ORDER BY, SUM() OVER (PARTITION BY dept) gives the same total on every row. With ORDER BY, it gives a running total. That distinction is a frequent interview question.

Frame Clause

Controls exactly which rows within the partition the function considers. Specified as ROWS or RANGE followed by a boundary (UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING). The default frame when ORDER BY is present is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is not always what you want.

The default frame trips people up constantly. If you specify ORDER BY but no frame, you get a running total by default, not a partition-wide total. Spell out the frame explicitly to avoid surprises.

The Six Window Functions You Need

These six cover 90%+ of interview questions. ROW_NUMBER, RANK, and DENSE_RANK handle ranking. LAG and LEAD handle row comparisons. NTILE handles bucketing. Master these and you can derive everything else.

ROW_NUMBER()

Assigns a unique sequential integer to each row within a partition. No ties. If two rows have the same ORDER BY value, the assignment is nondeterministic (the database picks one arbitrarily). This is the most-used window function in data engineering because it powers deduplication: partition by the natural key, order by updated_at DESC, keep row_number = 1.

SELECT
  employee_id,
  department,
  salary,
  ROW_NUMBER() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS rn
FROM employees;

Interview note: ROW_NUMBER is the standard dedup tool. Interviewers expect you to write a CTE with ROW_NUMBER and filter WHERE rn = 1 in the outer query. Know that ties are broken arbitrarily unless you add a tiebreaker column.

RANK()

Like ROW_NUMBER but handles ties. When two rows share the same ORDER BY value, they get the same rank. The next rank skips. If two employees tie at rank 2, the next employee gets rank 4 (not 3). This creates gaps in the sequence. Interviewers test whether you understand the gap behavior and when it matters.

SELECT
  employee_id,
  department,
  salary,
  RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS rnk
FROM employees;
-- Salaries: 90k, 90k, 80k -> Ranks: 1, 1, 3

Interview note: RANK gaps can surprise downstream consumers. If a report says 'show top 3 by rank,' ties at rank 1 mean rank 2 might not exist. Clarify with the interviewer whether gaps are acceptable.

DENSE_RANK()

Ranks with ties but no gaps. Two employees at rank 2 are followed by rank 3, not rank 4. Use DENSE_RANK when you need N distinct rank levels regardless of ties. A 'top 3 departments by revenue' query using DENSE_RANK guarantees exactly 3 distinct rank values in the output, though more than 3 rows may qualify.

SELECT
  employee_id,
  department,
  salary,
  DENSE_RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS drnk
FROM employees;
-- Salaries: 90k, 90k, 80k -> Dense ranks: 1, 1, 2

Interview note: The classic interview question: 'What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?' You need a crisp one-sentence answer for each, plus an example showing where they diverge.

LAG()

Accesses a value from a previous row within the partition, based on ORDER BY. LAG(column, 1) returns the value from one row back. LAG(column, 2) returns two rows back. If there is no previous row, the result is NULL (or a default you specify as the third argument). This is the go-to function for calculating row-over-row changes: revenue growth, day-over-day deltas, time between events.

SELECT
  order_date,
  daily_revenue,
  LAG(daily_revenue, 1) OVER (
    ORDER BY order_date
  ) AS prev_day_revenue,
  daily_revenue - LAG(daily_revenue, 1) OVER (
    ORDER BY order_date
  ) AS day_over_day_change
FROM daily_sales;

Interview note: LAG without a default returns NULL for the first row. If that NULL breaks a downstream calculation (like a percentage change), wrap it in COALESCE or handle it in the WHERE clause.

LEAD()

The mirror of LAG. Accesses a value from a subsequent row within the partition. LEAD(column, 1) returns the next row's value. Useful for calculating time-to-next-event, gap analysis between consecutive records, and session timeout detection. In clickstream data, LEAD(timestamp) minus the current timestamp gives you the idle time between clicks.

SELECT
  user_id,
  event_timestamp,
  LEAD(event_timestamp, 1) OVER (
    PARTITION BY user_id
    ORDER BY event_timestamp
  ) AS next_event_time,
  LEAD(event_timestamp, 1) OVER (
    PARTITION BY user_id
    ORDER BY event_timestamp
  ) - event_timestamp AS idle_time
FROM clickstream;

Interview note: LEAD returns NULL for the last row in each partition. For session detection, that NULL typically means 'session ended.' Do not COALESCE it to NOW() unless the business logic requires it.

NTILE(n)

Divides the partition into n roughly equal buckets and assigns a bucket number (1 through n) to each row. NTILE(4) creates quartiles. NTILE(100) creates percentiles. When the rows do not divide evenly, earlier buckets get one extra row. This is how you build percentile distributions without needing PERCENT_RANK or CUME_DIST.

SELECT
  employee_id,
  salary,
  NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;
-- 100 employees -> 25 per quartile
-- 101 employees -> bucket 1 gets 26, others get 25

Interview note: NTILE does not handle ties intelligently. Two identical salaries can end up in different quartiles if they fall on a bucket boundary. If that matters, use PERCENT_RANK or CUME_DIST instead.

Aggregate Window Functions

Any aggregate function (SUM, AVG, COUNT, MIN, MAX) can become a window function by adding OVER(). Without ORDER BY, you get a partition-wide total repeated on every row. With ORDER BY, you get a running calculation. Both patterns are common in interviews.

Running Total

The most common aggregate window pattern. SUM with ORDER BY accumulates values row by row. This is how you compute cumulative revenue, running balances, or progressive totals without a self-join or correlated subquery.

SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_revenue
FROM daily_sales;

-- order_date  | daily_revenue | cumulative_revenue
-- 2024-01-01  |         1200  |              1200
-- 2024-01-02  |          800  |              2000
-- 2024-01-03  |         1500  |              3500

Spelling out ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is optional here (it matches the default), but doing it explicitly signals to the interviewer that you know the default and are being intentional.

Moving Average

A fixed-width sliding window. Instead of accumulating from the beginning, you average over the last N rows. This smooths out daily noise in metrics like DAU, conversion rate, or latency. The frame clause does the work.

SELECT
  report_date,
  daily_active_users,
  AVG(daily_active_users) OVER (
    ORDER BY report_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS seven_day_avg
FROM app_metrics;

-- First 6 rows have fewer than 7 data points.
-- Day 1: avg of 1 value. Day 2: avg of 2 values. Etc.

Use ROWS, not RANGE, for moving averages. RANGE would group rows with the same date, which is rarely what you want for a 7-day window.

Partition-Wide Aggregate

When you omit ORDER BY, the aggregate computes across the entire partition and repeats the same value on every row. This lets you compare each row to its group total without a self-join or subquery.

SELECT
  employee_id,
  department,
  salary,
  SUM(salary) OVER (PARTITION BY department) AS dept_total,
  ROUND(
    salary * 100.0 / SUM(salary) OVER (PARTITION BY department),
    1
  ) AS pct_of_dept
FROM employees;

-- Each row shows the individual salary AND what
-- percentage of the department total it represents.

This pattern replaces the old-school approach of joining a table to a subquery that groups and aggregates. It is cleaner, faster, and easier to read. Interviewers favor it.

Frame Clause Deep Dive

The frame clause is the part most people skip and the part that causes the most bugs. It controls exactly which rows within the ordered partition the window function sees. Two keywords, five boundaries, and one critical default you need to know.

ROWS vs RANGE

ROWS counts physical rows. RANGE groups rows by value. The difference only matters when duplicate ORDER BY values exist.

-- Data: dates with duplicates
-- 2024-01-01: $100
-- 2024-01-01: $200  (same date)
-- 2024-01-02: $150

-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Row 1: 100
-- Row 2: 100 + 200 = 300
-- Row 3: 100 + 200 + 150 = 450

-- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Row 1: 100 + 200 = 300  (includes all rows with same date)
-- Row 2: 100 + 200 = 300  (same result, same date group)
-- Row 3: 100 + 200 + 150 = 450

RANGE treats rows with the same ORDER BY value as peers and includes all peers in the frame. ROWS processes them one at a time. For running totals, ROWS is almost always what you want.

The Five Boundaries

Every frame has a start and an end. The start must come before (or equal) the end in the ordering. Here are the five boundary options, from earliest to latest.

UNBOUNDED PRECEDING   -- first row of the partition
N PRECEDING           -- N rows before the current row
CURRENT ROW           -- the current row itself
N FOLLOWING           -- N rows after the current row
UNBOUNDED FOLLOWING   -- last row of the partition

-- Examples:
-- Running total from start to here:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 3-row centered moving average:
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

-- Everything from here to the end:
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

-- Full partition (same as no frame at all):
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

The Default Frame Trap

When you write ORDER BY without an explicit frame, the SQL standard says the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This creates a running calculation, not a partition-wide total. When you omit ORDER BY entirely, there is no frame and the function sees the whole partition. Mixing these up is one of the most common window function bugs.

-- Partition-wide total (no ORDER BY, no frame)
SUM(salary) OVER (PARTITION BY dept)
-- Every row shows the same total: 500000

-- Running total (ORDER BY triggers default frame)
SUM(salary) OVER (PARTITION BY dept ORDER BY hire_date)
-- Row 1: 60000, Row 2: 125000, Row 3: 195000 ...

-- Explicit partition-wide total WITH ordering
SUM(salary) OVER (
  PARTITION BY dept
  ORDER BY hire_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
-- Every row shows the same total: 500000
-- but rows are still ordered by hire_date

If you want ORDER BY for deterministic output but a partition-wide aggregate, you must explicitly set the frame to UNBOUNDED PRECEDING to UNBOUNDED FOLLOWING. The interviewer will notice whether you know this.

7 Window Function Interview Questions

These are the patterns that come up repeatedly at companies like Meta, Amazon, Airbnb, and Stripe. The approach notes show what the interviewer expects, not just the final query.

Q1: Given a table of user logins (user_id, login_date), find users who logged in on 3 or more consecutive days.

Approach:

Use ROW_NUMBER() partitioned by user_id, ordered by login_date. Subtract the row number (as an interval) from login_date to create a grouping key. Consecutive dates produce the same grouping key. Then GROUP BY user_id and the grouping key, filter HAVING COUNT(*) >= 3. This is the islands-and-gaps pattern. Make sure to DISTINCT the login_date first to handle multiple logins per day.

Q2: Write a query to compute month-over-month revenue growth as a percentage.

Approach:

Use LAG(revenue, 1) OVER (ORDER BY month) to get the previous month's revenue. Then calculate (revenue - prev_revenue) / prev_revenue * 100.0. The first month returns NULL because LAG has no prior row. Do not COALESCE it to zero because that would imply 0% growth, which is wrong. Use NULLIF in the denominator to guard against division by zero if a prior month had zero revenue.

Q3: Deduplicate a table that receives late-arriving data. Keep only the most recent record per entity.

Approach:

CTE with ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY updated_at DESC) AS rn. Outer query filters WHERE rn = 1. If updated_at can have ties, add a tiebreaker: ORDER BY updated_at DESC, id DESC. This is the single most common window function pattern in production data pipelines, and interviewers expect you to write it in under two minutes.

Q4: Rank products by total sales within each category, but skip no ranks when there are ties.

Approach:

DENSE_RANK() OVER (PARTITION BY category ORDER BY total_sales DESC). The interviewer is testing whether you pick DENSE_RANK over RANK. If they say 'show the top 3,' DENSE_RANK guarantees exactly 3 distinct rank levels. Mention that more than 3 rows might appear if there are ties at rank 3.

Q5: Calculate a 7-day moving average of daily active users.

Approach:

AVG(dau) OVER (ORDER BY report_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). The frame includes the current row plus the 6 before it, for 7 rows total. Stress the difference between ROWS and RANGE: ROWS counts physical rows, RANGE would group rows with the same date (unlikely here, but the interviewer might ask). For the first 6 days, the average uses fewer than 7 data points. Decide whether to show those partial averages or filter them out.

Q6: For each employee, show their salary and the gap between their salary and the highest salary in their department.

Approach:

MAX(salary) OVER (PARTITION BY department) AS dept_max, then compute dept_max - salary AS gap. No ORDER BY needed in the window because you want a single aggregate across the entire partition. An alternative: FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) gives the same result but requires ORDER BY.

Q7: Sessionize clickstream data: assign a session_id to each event, where a new session starts after 30 minutes of inactivity.

Approach:

Step 1: LAG(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp) to get the previous event time. Step 2: CASE WHEN the gap exceeds 30 minutes, flag it as a new session (1), else 0. Step 3: SUM() that flag as a window function (running sum) partitioned by user_id, ordered by event_timestamp. The running sum increments at each session boundary, producing a session_id. This three-step pattern is standard for clickstream sessionization and shows up in interviews at Airbnb, Spotify, and similar product-analytics companies.

Common Window Function Mistakes

These four errors account for most window function bugs in interviews and production code.

Forgetting ORDER BY inside RANK or DENSE_RANK

RANK() OVER (PARTITION BY department) without ORDER BY is valid syntax in most engines, but every row gets rank 1 because the database has no criteria to differentiate them. This is never what you want. Always include ORDER BY when using ranking functions. ROW_NUMBER without ORDER BY produces nondeterministic results: the row numbering changes between executions.

Confusing ROWS and RANGE in frame clauses

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW counts exactly 2 physical rows back. RANGE BETWEEN 2 PRECEDING AND CURRENT ROW includes all rows whose ORDER BY value is within 2 of the current row's value. With dates, RANGE can pull in multiple rows if several share the same date. Most engines default to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when you specify ORDER BY without an explicit frame. This default catches people because it is not the same as ROWS UNBOUNDED PRECEDING.

Treating PARTITION BY like GROUP BY

GROUP BY collapses rows: 1000 rows become 10 groups with 10 output rows. PARTITION BY does not collapse anything: 1000 rows in, 1000 rows out. The window function computes a value for each row based on its partition, but every original row survives. Candidates who mix these up try to SELECT non-aggregated columns alongside GROUP BY, get errors, and waste interview time debugging.

Using a window function in WHERE

You cannot write WHERE ROW_NUMBER() OVER (...) = 1. Window functions execute after WHERE. To filter on a window function result, wrap the query in a CTE or subquery and filter in the outer query. This is such a common slip that some interviewers set it as a deliberate trap: they give you a requirement that requires filtering on a rank, and watch whether you reach for a CTE.

SQL Window Functions FAQ

What is a window function in SQL?+
A window function performs a calculation across a set of rows that are related to the current row, without collapsing those rows into a single output. Unlike GROUP BY, which reduces rows, a window function keeps every row and adds a computed column. The 'window' is defined by the OVER() clause, which specifies partitioning, ordering, and an optional frame.
What is the difference between PARTITION BY and GROUP BY?+
GROUP BY reduces rows: 1000 input rows might become 10 output rows (one per group). PARTITION BY keeps all 1000 rows but divides them into partitions for the window calculation. Each row gets its own computed value based on its partition. You use GROUP BY for aggregate reports and PARTITION BY when you need per-row calculations that reference other rows in the same group.
Can I use multiple window functions in the same query?+
Yes. You can have as many window functions as you need in a single SELECT. Each can have its own OVER() clause with different PARTITION BY, ORDER BY, and frame specifications. If multiple window functions share the same OVER() definition, use a WINDOW clause (supported in PostgreSQL, MySQL 8+, and BigQuery) to define it once and reference it by name.
Do all databases support window functions?+
All major analytical databases support them: PostgreSQL, MySQL 8+, SQL Server, Oracle, BigQuery, Snowflake, Redshift, Databricks, and DuckDB. MySQL 5.x and SQLite before 3.25 do not. If you are targeting MySQL, confirm the version. In interviews, you can safely assume window function support unless told otherwise.
When should I use a window function instead of a self-join?+
Prefer window functions when you need to compare a row to its neighbors (LAG/LEAD), compute running totals (SUM OVER), or rank rows within groups (ROW_NUMBER/RANK). Self-joins can accomplish the same results but are harder to read, harder to optimize, and more error-prone. The one case where a self-join wins: when the database engine is old enough to lack window function support.

Stop Memorizing. Start Partitioning.

Reading OVER clauses won't save you when the interviewer asks for a dedup CTE. Run the queries yourself, break them, fix them, and see what the planner actually does with your frame spec.