Most candidates memorize ROW_NUMBER and call themselves comfortable with window functions. Interviewers don't buy it. Window functions appear in roughly 35% of senior-level DE interviews we've analyzed, and the ones that actually separate strong candidates from weak ones are never the simple ranking questions. They're the frame clause questions, the LAG with a default value questions, and the "what does PARTITION BY without ORDER BY even do" questions.
This isn't a reference card for the things you already know. It's a reference card for the things you think you know, and a quiet check on the ones you don't.
Senior rounds use them
Use PARTITION BY
Use ROW_NUMBER
Aggregate windows
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
function_name(args) OVER (
[PARTITION BY partition_columns]
[ORDER BY sort_columns [ASC|DESC]]
[frame_clause]
)PARTITION BY: Divides rows into groups. The function resets for each group. Omitting it treats the entire result set as one partition.
ORDER BY: Defines the row order within each partition. Required for ranking and offset functions. Optional for aggregate functions (but changes their behavior when present).
Frame clause: Defines which rows within the partition are included in the calculation. Only applies to aggregate functions and FIRST_VALUE / LAST_VALUE.
Don't default to ROW_NUMBER just because it's the one you remember. The interviewer asking for "the top three by salary per department" is usually checking whether you reach for DENSE_RANK or RANK when ties matter, and whether you know the difference. ROW_NUMBER with a tied ORDER BY produces nondeterministic output. That's a bug, not a feature, and it's the first thing a sharp interviewer probes.
Assigns a unique sequential integer to each row. Ties get different numbers (arbitrary tiebreak).
-- Number each employee within their department by salary
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 3, 4 (no gaps, no ties)Like ROW_NUMBER but ties get the same rank. Leaves gaps after ties.
-- Rank employees by salary (gaps after ties)
RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 2, 4 (gap: 3 is skipped)Like RANK but no gaps. Ties get the same rank. The next distinct value gets the next integer.
-- Dense rank: no gaps after ties
DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 2, 3 (no gap)Divides the partition into n approximately equal groups and assigns a group number (1 to n) to each row.
-- Divide employees into 4 salary quartiles
NTILE(4) OVER (ORDER BY salary)
-- Result: 1, 1, 2, 2, 3, 3, 4, 4 (for 8 rows)Standard aggregate functions work as window functions when used with OVER(). Without ORDER BY, they compute over the entire partition. With ORDER BY, they compute a running aggregate (using the default frame).
-- Running total of revenue by date
SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
-- Total revenue per department (on every row)
SUM(revenue) OVER (PARTITION BY dept)
-- 7-day rolling sum
SUM(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)-- Running average
AVG(salary) OVER (ORDER BY hire_date ROWS UNBOUNDED PRECEDING)
-- 30-day moving average
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
-- Department average (compare each row to its group)
AVG(salary) OVER (PARTITION BY dept)-- Running count
COUNT(*) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
-- Total rows in partition
COUNT(*) OVER (PARTITION BY dept)
-- Count of non-null values in a window
COUNT(email) OVER (PARTITION BY dept)-- Running minimum
MIN(price) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
-- Max salary in each department
MAX(salary) OVER (PARTITION BY dept)
-- Sliding window min (3-row window)
MIN(temperature) OVER (ORDER BY timestamp ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)Offset functions access a value from a different row relative to the current row. They require ORDER BY and do not use a frame clause (except FIRST_VALUE and LAST_VALUE).
Returns the value from a previous row. Offset defaults to 1. Default value used when there is no previous row.
-- Previous day's revenue
LAG(revenue, 1, 0) OVER (ORDER BY date)
-- Revenue from 7 days ago
LAG(revenue, 7) OVER (ORDER BY date)
-- Day-over-day change
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS daily_changeReturns the value from a subsequent row. Mirror of LAG.
-- Next day's revenue
LEAD(revenue, 1) OVER (ORDER BY date)
-- Time to next event (session analysis)
LEAD(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
- event_time AS time_to_nextReturns the first value in the window frame. Respects the frame clause.
-- First order date per customer
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS UNBOUNDED PRECEDING
)
-- Earliest hire in each department
FIRST_VALUE(name) OVER (
PARTITION BY dept ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)Returns the last value in the window frame. Important: the default frame ends at CURRENT ROW, so you usually need to specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- Most recent order per customer (needs full frame)
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
-- WRONG: default frame stops at current row
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) -- this gives the CURRENT row, not the lastReturns the nth value in the window frame. Useful for getting the second-highest, third-most-recent, etc.
-- Second highest salary per department
NTH_VALUE(salary, 2) OVER (
PARTITION BY dept ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)Relative rank as a percentage: (rank - 1) / (total_rows - 1). Returns 0.0 for the first row, 1.0 for the last.
PERCENT_RANK() OVER (ORDER BY salary)
-- Salary at 50th percentile gets ~0.5Cumulative distribution: fraction of rows with values less than or equal to the current row. Always returns a value between 0 (exclusive) and 1 (inclusive).
CUME_DIST() OVER (ORDER BY score)
-- 0.25 means 25% of rows have a score <= this row's scoreThe frame clause defines which rows are included in the window calculation. It only applies to aggregate functions and FIRST_VALUE / LAST_VALUE / NTH_VALUE.
{ROWS | RANGE | GROUPS} BETWEEN
{UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW}
AND
{CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING}| Frame | Meaning | Use Case |
|---|---|---|
| ROWS UNBOUNDED PRECEDING | From partition start to current row | Running total, running count |
| ROWS BETWEEN 6 PRECEDING AND CURRENT ROW | Current row + 6 rows before | 7-day rolling average |
| ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING | Previous row, current, next row | Smoothing, 3-point average |
| ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | Entire partition | Partition total on every row |
| ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | Current row to partition end | Reverse running total |
Interview note: The most common interview mistake with window functions is forgetting the default frame. When ORDER BY is present, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means LAST_VALUE returns the current row, not the actual last row. Always specify the frame explicitly when using FIRST_VALUE, LAST_VALUE, or NTH_VALUE.
| Pattern | OVER Clause | Result |
|---|---|---|
| Global total | OVER () | Same value on every row |
| Group total | OVER (PARTITION BY dept) | Same value within each dept |
| Global running total | OVER (ORDER BY date) | Cumulative across all rows |
| Group running total | OVER (PARTITION BY dept ORDER BY date) | Cumulative within each dept |
| Group ranking | OVER (PARTITION BY dept ORDER BY salary DESC) | Rank resets per dept |
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY dept ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;SELECT
month,
revenue,
LAG(revenue, 12) OVER (ORDER BY month) AS prev_year,
revenue - LAG(revenue, 12) OVER (ORDER BY month) AS yoy_change
FROM monthly_revenue;SELECT
dept,
name,
salary,
ROUND(
100.0 * salary / SUM(salary) OVER (PARTITION BY dept),
1
) AS pct_of_dept
FROM employees;-- Find consecutive days of positive revenue
WITH flagged AS (
SELECT
date,
revenue,
date - ROW_NUMBER() OVER (ORDER BY date)::INT AS grp
FROM daily_metrics
WHERE revenue > 0
)
SELECT MIN(date) AS streak_start, MAX(date) AS streak_end,
COUNT(*) AS streak_length
FROM flagged
GROUP BY grp
ORDER BY streak_length DESC;Candidates who memorize the syntax still get caught on frame defaults and tied orderings. The fix is reps against a real engine where the wrong answer actually looks wrong.