SQL Window Functions Practice
124 window function problems with worked solutions, graded against a real Postgres warehouse. ROW_NUMBER, RANK and DENSE_RANK, LAG and LEAD, running totals, rolling averages, PARTITION BY. The single most-tested SQL topic in data engineering interview loops.
124 SQL window function practice problems with worked solutions. ROW_NUMBER, RANK and DENSE_RANK, LAG and LEAD, running totals, rolling averages, PARTITION BY. Solutions run against a real Postgres warehouse. Window functions are the single most-tested SQL topic in data engineering interview loops, and the topic where most candidates lose time.
All window functions, in one table
The 8 window functions that cover 95% of interview questions. The trap column is the follow-up the interviewer probably asks next.
| Function | Syntax | When to use | Trap |
|---|---|---|---|
| ROW_NUMBER() | ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) | Unique sequential rank, no ties. Dedup, top-N per group, pagination. | Arbitrarily picks one row when the ORDER BY has ties. |
| RANK() | RANK() OVER (ORDER BY x DESC) | Same rank for ties, skips after. 1, 2, 2, 4. | Skipping makes 'top 3 with ties' return nothing when two share top. |
| DENSE_RANK() | DENSE_RANK() OVER (ORDER BY x DESC) | Same rank for ties, no skip. 1, 2, 2, 3. Usually the right call. | Verbalizing why you chose DENSE_RANK over RANK is the senior signal. |
| LAG() / LEAD() | LAG(col, 1, default) OVER (ORDER BY ts) | Previous or next row inline. Day-over-day deltas, session detection, sequence pattern matching. | First row's LAG is NULL; NULLIF or COALESCE the divisor when computing percent changes. |
| SUM() OVER | SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) | Running total. Cumulative balance, year-to-date. | Forgetting the frame clause defaults to RANGE UNBOUNDED PRECEDING which behaves differently on ties. |
| AVG() OVER (ROWS BETWEEN) | AVG(x) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) | Rolling average. 7-day moving averages, smoothed time series. | Partial windows on the first six rows; interviewer always asks about them. |
| NTILE(n) | NTILE(4) OVER (ORDER BY revenue DESC) | Quartiles, deciles, equal-sized buckets. | Buckets aren't perfectly equal when row count isn't divisible by n. |
| FIRST_VALUE() / LAST_VALUE() | FIRST_VALUE(x) OVER (PARTITION BY user_id ORDER BY ts) | First or last value in window. Session start time, user's first event. | LAST_VALUE defaults to the current row without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Classic interview trap. |
ROW_NUMBER practice: deduplication pattern
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_timestamp DESC, event_id DESC
) AS rn
FROM event_data
)
SELECT user_id, event_timestamp, event_id, payload
FROM ranked
WHERE rn = 1;The single most useful window function pattern in data engineering. ROW_NUMBER assigns a unique rank inside each partition; filtering rn = 1 keeps exactly one row per key. Postgres has DISTINCT ON for this; BigQuery and Snowflake have QUALIFY. The CTE form works everywhere. Common mistakes: using DISTINCT (no control over which duplicate survives), GROUP BY with MAX (loses the other columns), ASC instead of DESC.
RANK vs DENSE_RANK practice: top-N per group with ties
WITH ranked AS (
SELECT
department,
emp_name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rk
FROM employees
)
SELECT department, emp_name, salary
FROM ranked
WHERE rk <= 3
ORDER BY department, salary DESC;Top three salaries per department, handling ties correctly. DENSE_RANK keeps tied employees and continues counting. RANK skips numbers after ties; if two share top, your WHERE rk <= 3 misses the next tier. ROW_NUMBER arbitrarily breaks ties and drops a real top-3 finisher. The interviewer's follow-up is always 'what changes if two employees share the highest salary'.
Running total practice: SUM OVER with frame clauses
SELECT
transaction_date,
total_amount,
SUM(total_amount) OVER (
ORDER BY transaction_date
ROWS UNBOUNDED PRECEDING
) AS running_total
FROM transactions
ORDER BY transaction_date;Cumulative sum that grows with each row. ROWS UNBOUNDED PRECEDING means 'all rows up to and including the current one'. The frame clause is what makes this powerful. The trap: omitting the frame clause changes the default to RANGE UNBOUNDED PRECEDING, which behaves differently when the ORDER BY column has ties.
Rolling average practice: 7-day window with ROWS BETWEEN
WITH daily_revenue_summary AS (
SELECT
date(transaction_date) AS order_date,
SUM(total_amount) AS daily_revenue
FROM transactions
GROUP BY date(transaction_date)
)
SELECT
order_date,
daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS rolling_7d_avg
FROM daily_revenue_summary
ORDER BY order_date;7-day rolling average. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives a frame of exactly 7 physical rows. The interviewer's follow-up is always about the first 6 rows where the frame is shorter than 7. Mentioning partial windows before being asked is the senior signal. Use ROWS over RANGE because RANGE collapses ties on the ORDER BY column, which on intraday or duplicated dates silently changes the answer.
LAG and LEAD practice: month-over-month change
WITH monthly AS (
SELECT
strftime('%Y-%m', transaction_date) AS month,
SUM(total_amount) AS revenue
FROM transactions
GROUP BY 1
)
SELECT
month,
revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_delta,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0), 1) AS mom_pct
FROM monthly
ORDER BY month;LAG accesses the previous row inline without a self-join. The CTE chain makes the three steps obvious: aggregate to month, LAG one row back, compute percent change. NULLIF on the LAG result is what interviewers want you to volunteer; the first month has no predecessor and naive division crashes or silently produces infinity.
PARTITION BY practice: sessionization with a 30-minute gap
-- Sessionize events with a 30-minute inactivity gap.
WITH with_gap AS (
SELECT
user_id,
event_timestamp AS event_ts,
LAG(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp)
AS prev_ts
FROM event_data
),
flagged AS (
SELECT
user_id,
event_ts,
CASE
WHEN prev_ts IS NULL
OR (julianday(event_ts) - julianday(prev_ts)) * 24 * 60 > 30
THEN 1 ELSE 0
END AS new_session
FROM with_gap
)
SELECT
user_id,
event_ts,
SUM(new_session) OVER (
PARTITION BY user_id ORDER BY event_ts
) AS session_id
FROM flagged;Sessionize events using LAG to compare each event to the previous one within the user partition, then a running SUM over a binary flag to assign session IDs. The pattern combines three window-function ideas: PARTITION BY for per-user state, LAG for previous-row comparison, SUM OVER for cumulative ID assignment. Shows up at L5 and above when interviewers want to see whether you can compose window functions.
ROW_NUMBER vs RANK vs DENSE_RANK on a tied dataset
Same five rows, three different ranking choices. The difference between them is the most common follow-up in any window-function round. Reaching for ROW_NUMBER on a top-N problem with ties is the most common mistake and costs the round even when the query runs.
| Employee | Salary | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|---|
| Alice | $180,000 | 1 | 1 | 1 |
| Bob | $180,000 | 2 | 1 | 1 |
| Carol | $175,000 | 3 | 3 | 2 |
| Dan | $160,000 | 4 | 4 | 3 |
| Eve | $160,000 | 5 | 4 | 3 |
ROWS vs RANGE frame clauses
ROWS counts physical rows. RANGE collapses ties on the ORDER BY column into a single logical group. They produce the same output when the ORDER BY column has no duplicates and different output the moment ties appear.
Default to ROWS unless you know you want RANGE. Most rolling-window questions use ROWS BETWEEN N PRECEDING AND CURRENT ROW. The frame clause is also what fixes the LAST_VALUE trap: without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, LAST_VALUE defaults to the current row, which is almost never what you want.
Where window functions show up on the job
The reason interviewers test them so heavily: they're load-bearing in real warehouses. Five production patterns the bank reproduces in practice form.
| Production use case | Window-function pattern | Where it lives |
|---|---|---|
| Dedup in CDC pipelines | ROW_NUMBER PARTITION BY natural_key ORDER BY ts DESC, filter rn = 1 | Every CDC pipeline that lands updates from a source |
| Sessionization | LAG for gap detection, SUM OVER on a new-session flag | Every analytics warehouse with event data |
| Retention cohorts | LAG and LEAD comparing event timing inside a user partition | Growth and product analytics teams |
| Pricing, surge, and dynamic limits | Window aggregates over time-bounded frames (ROWS BETWEEN N PRECEDING) | Ride-share, ads, real-time pricing |
| Anomaly detection | AVG OVER and STDDEV_SAMP OVER vs the current row, flag outside N sigma | Data quality monitoring, ML feature drift |
Common questions about window functions
Why are window functions so important in DE interviews?+
What's the difference between RANK and ROW_NUMBER in plain English?+
When do I use ROWS vs RANGE in a frame clause?+
How do I practice LAG and LEAD effectively?+
What window function questions come up at FAANG?+
Are QUALIFY-style filters dialect-portable?+
Drill the topic that decides the round
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition