SQL Window Functions Practice

Q: Why are window functions so important in DE interviews?

They're the most-tested SQL topic at L4 and above, and the topic that separates candidates who write production SQL from candidates who only write reporting queries. Dedup, sessionization, retention cohorts, and rolling metrics all use them. If window functions aren't reflex, the SQL round becomes a syntax-recall test instead of a problem-solving test.

Q: What's the difference between RANK and ROW_NUMBER in plain English?

ROW_NUMBER gives every row a unique number; ties are broken arbitrarily. RANK gives the same number to ties and skips the next numbers (1, 2, 2, 4). DENSE_RANK gives the same number to ties without skipping (1, 2, 2, 3). For 'top N' problems where ties should be kept, DENSE_RANK is almost always the right call.

Q: When do I use ROWS vs RANGE in a frame clause?

ROWS for almost everything. ROWS counts physical preceding rows; RANGE collapses rows that tie on the ORDER BY column into one logical position. On unique date columns the two produce the same output. On timestamps that can repeat or on intraday data, RANGE silently changes the answer.

Q: How do I practice LAG and LEAD effectively?

Find problems that compute period-over-period changes: month-over-month revenue, week-over-week retention, order-to-order amount delta. Use LAG to pull the previous value, then compute the difference or percent change. Always wrap the divisor in NULLIF to handle the first row of each partition where LAG returns NULL.

Q: What window function questions come up at FAANG?

ROW_NUMBER deduplication, RANK with ties for top-N problems, LAG for sequence-based detection, SUM OVER for running balances, and the gap-and-island pattern (ROW_NUMBER subtracted from a date column to find runs of consecutive days). The frame clause comes up as a follow-up almost every time.

Q: Are QUALIFY-style filters dialect-portable?

No. QUALIFY filters on window function results without a CTE, but only Snowflake, BigQuery, Teradata, and a handful of others support it. Postgres, MySQL, and Redshift require the CTE form. The CTE form is the safe default; mention QUALIFY as the dialect-aware alternative when the interviewer asks about Snowflake or BigQuery specifically.

124 window function problems with worked solutions, graded against a real Postgres warehouse. ROW_NUMBER, RANK and DENSE_RANK, LAG and LEAD, running totals, rolling averages, PARTITION BY. The single most-tested SQL topic in data engineering interview loops.

124 SQL window function practice problems with worked solutions. ROW_NUMBER, RANK and DENSE_RANK, LAG and LEAD, running totals, rolling averages, PARTITION BY. Solutions run against a real Postgres warehouse. Window functions are the single most-tested SQL topic in data engineering interview loops, and the topic where most candidates lose time.

Open the window functions bank Start a random window function problem

All window functions, in one table

The 8 window functions that cover 95% of interview questions. The trap column is the follow-up the interviewer probably asks next.

Function	Syntax	When to use	Trap
ROW_NUMBER()	ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)	Unique sequential rank, no ties. Dedup, top-N per group, pagination.	Arbitrarily picks one row when the ORDER BY has ties.
RANK()	RANK() OVER (ORDER BY x DESC)	Same rank for ties, skips after. 1, 2, 2, 4.	Skipping makes 'top 3 with ties' return nothing when two share top.
DENSE_RANK()	DENSE_RANK() OVER (ORDER BY x DESC)	Same rank for ties, no skip. 1, 2, 2, 3. Usually the right call.	Verbalizing why you chose DENSE_RANK over RANK is the senior signal.
LAG() / LEAD()	LAG(col, 1, default) OVER (ORDER BY ts)	Previous or next row inline. Day-over-day deltas, session detection, sequence pattern matching.	First row's LAG is NULL; NULLIF or COALESCE the divisor when computing percent changes.
SUM() OVER	SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)	Running total. Cumulative balance, year-to-date.	Forgetting the frame clause defaults to RANGE UNBOUNDED PRECEDING which behaves differently on ties.
AVG() OVER (ROWS BETWEEN)	AVG(x) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)	Rolling average. 7-day moving averages, smoothed time series.	Partial windows on the first six rows; interviewer always asks about them.
NTILE(n)	NTILE(4) OVER (ORDER BY revenue DESC)	Quartiles, deciles, equal-sized buckets.	Buckets aren't perfectly equal when row count isn't divisible by n.
FIRST_VALUE() / LAST_VALUE()	FIRST_VALUE(x) OVER (PARTITION BY user_id ORDER BY ts)	First or last value in window. Session start time, user's first event.	LAST_VALUE defaults to the current row without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Classic interview trap.

ROW_NUMBER practice: deduplication pattern

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY event_timestamp DESC, event_id DESC
    ) AS rn
  FROM event_data
)
SELECT user_id, event_timestamp, event_id, payload
FROM ranked
WHERE rn = 1;

The single most useful window function pattern in data engineering. ROW_NUMBER assigns a unique rank inside each partition; filtering rn = 1 keeps exactly one row per key. Postgres has DISTINCT ON for this; BigQuery and Snowflake have QUALIFY. The CTE form works everywhere. Common mistakes: using DISTINCT (no control over which duplicate survives), GROUP BY with MAX (loses the other columns), ASC instead of DESC.

RANK vs DENSE_RANK practice: top-N per group with ties

WITH ranked AS (
  SELECT
    department,
    emp_name,
    salary,
    DENSE_RANK() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS rk
  FROM employees
)
SELECT department, emp_name, salary
FROM ranked
WHERE rk <= 3
ORDER BY department, salary DESC;

Top three salaries per department, handling ties correctly. DENSE_RANK keeps tied employees and continues counting. RANK skips numbers after ties; if two share top, your WHERE rk <= 3 misses the next tier. ROW_NUMBER arbitrarily breaks ties and drops a real top-3 finisher. The interviewer's follow-up is always 'what changes if two employees share the highest salary'.

Running total practice: SUM OVER with frame clauses

SELECT
  transaction_date,
  total_amount,
  SUM(total_amount) OVER (
    ORDER BY transaction_date
    ROWS UNBOUNDED PRECEDING
  ) AS running_total
FROM transactions
ORDER BY transaction_date;

Cumulative sum that grows with each row. ROWS UNBOUNDED PRECEDING means 'all rows up to and including the current one'. The frame clause is what makes this powerful. The trap: omitting the frame clause changes the default to RANGE UNBOUNDED PRECEDING, which behaves differently when the ORDER BY column has ties.

Rolling average practice: 7-day window with ROWS BETWEEN

WITH daily_revenue_summary AS (
  SELECT
    date(transaction_date) AS order_date,
    SUM(total_amount) AS daily_revenue
  FROM transactions
  GROUP BY date(transaction_date)
)
SELECT
  order_date,
  daily_revenue,
  ROUND(AVG(daily_revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS rolling_7d_avg
FROM daily_revenue_summary
ORDER BY order_date;

7-day rolling average. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives a frame of exactly 7 physical rows. The interviewer's follow-up is always about the first 6 rows where the frame is shorter than 7. Mentioning partial windows before being asked is the senior signal. Use ROWS over RANGE because RANGE collapses ties on the ORDER BY column, which on intraday or duplicated dates silently changes the answer.

LAG and LEAD practice: month-over-month change

WITH monthly AS (
  SELECT
    strftime('%Y-%m', transaction_date) AS month,
    SUM(total_amount) AS revenue
  FROM transactions
  GROUP BY 1
)
SELECT
  month,
  revenue,
  revenue - LAG(revenue) OVER (ORDER BY month) AS mom_delta,
  ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0), 1) AS mom_pct
FROM monthly
ORDER BY month;

LAG accesses the previous row inline without a self-join. The CTE chain makes the three steps obvious: aggregate to month, LAG one row back, compute percent change. NULLIF on the LAG result is what interviewers want you to volunteer; the first month has no predecessor and naive division crashes or silently produces infinity.

PARTITION BY practice: sessionization with a 30-minute gap

-- Sessionize events with a 30-minute inactivity gap.
WITH with_gap AS (
  SELECT
    user_id,
    event_timestamp AS event_ts,
    LAG(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp)
      AS prev_ts
  FROM event_data
),
flagged AS (
  SELECT
    user_id,
    event_ts,
    CASE
      WHEN prev_ts IS NULL
        OR (julianday(event_ts) - julianday(prev_ts)) * 24 * 60 > 30
      THEN 1 ELSE 0
    END AS new_session
  FROM with_gap
)
SELECT
  user_id,
  event_ts,
  SUM(new_session) OVER (
    PARTITION BY user_id ORDER BY event_ts
  ) AS session_id
FROM flagged;

Sessionize events using LAG to compare each event to the previous one within the user partition, then a running SUM over a binary flag to assign session IDs. The pattern combines three window-function ideas: PARTITION BY for per-user state, LAG for previous-row comparison, SUM OVER for cumulative ID assignment. Shows up at L5 and above when interviewers want to see whether you can compose window functions.

ROW_NUMBER vs RANK vs DENSE_RANK on a tied dataset

Same five rows, three different ranking choices. The difference between them is the most common follow-up in any window-function round. Reaching for ROW_NUMBER on a top-N problem with ties is the most common mistake and costs the round even when the query runs.

Employee	Salary	ROW_NUMBER	RANK	DENSE_RANK
Alice	$180,000	1	1	1
Bob	$180,000	2	1	1
Carol	$175,000	3	3	2
Dan	$160,000	4	4	3
Eve	$160,000	5	4	3

ROW_NUMBERPicks one of the tied rows arbitrarily. Right call for dedup and pagination, wrong for top-N with ties.

RANKSame rank for ties, gaps after. Carol jumps to 3 and Dan jumps to 4. Top-3 misses both Dan and Eve.

DENSE_RANKSame rank for ties, no gaps. Top-3 captures everyone tied at the cutoff. Usually the right call.

ROWS vs RANGE frame clauses

ROWS counts physical rows. RANGE collapses ties on the ORDER BY column into a single logical group. They produce the same output when the ORDER BY column has no duplicates and different output the moment ties appear.

Default to ROWS unless you know you want RANGE. Most rolling-window questions use ROWS BETWEEN N PRECEDING AND CURRENT ROW. The frame clause is also what fixes the LAST_VALUE trap: without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, LAST_VALUE defaults to the current row, which is almost never what you want.

Where window functions show up on the job

The reason interviewers test them so heavily: they're load-bearing in real warehouses. Five production patterns the bank reproduces in practice form.

Production use case	Window-function pattern	Where it lives
Dedup in CDC pipelines	ROW_NUMBER PARTITION BY natural_key ORDER BY ts DESC, filter rn = 1	Every CDC pipeline that lands updates from a source
Sessionization	LAG for gap detection, SUM OVER on a new-session flag	Every analytics warehouse with event data
Retention cohorts	LAG and LEAD comparing event timing inside a user partition	Growth and product analytics teams
Pricing, surge, and dynamic limits	Window aggregates over time-bounded frames (ROWS BETWEEN N PRECEDING)	Ride-share, ads, real-time pricing
Anomaly detection	AVG OVER and STDDEV_SAMP OVER vs the current row, flag outside N sigma	Data quality monitoring, ML feature drift

Common questions about window functions

Why are window functions so important in DE interviews?+

They're the most-tested SQL topic at L4 and above, and the topic that separates candidates who write production SQL from candidates who only write reporting queries. Dedup, sessionization, retention cohorts, and rolling metrics all use them. If window functions aren't reflex, the SQL round becomes a syntax-recall test instead of a problem-solving test.

What's the difference between RANK and ROW_NUMBER in plain English?+

ROW_NUMBER gives every row a unique number; ties are broken arbitrarily. RANK gives the same number to ties and skips the next numbers (1, 2, 2, 4). DENSE_RANK gives the same number to ties without skipping (1, 2, 2, 3). For 'top N' problems where ties should be kept, DENSE_RANK is almost always the right call.

When do I use ROWS vs RANGE in a frame clause?+

ROWS for almost everything. ROWS counts physical preceding rows; RANGE collapses rows that tie on the ORDER BY column into one logical position. On unique date columns the two produce the same output. On timestamps that can repeat or on intraday data, RANGE silently changes the answer.

How do I practice LAG and LEAD effectively?+

Find problems that compute period-over-period changes: month-over-month revenue, week-over-week retention, order-to-order amount delta. Use LAG to pull the previous value, then compute the difference or percent change. Always wrap the divisor in NULLIF to handle the first row of each partition where LAG returns NULL.

What window function questions come up at FAANG?+

ROW_NUMBER deduplication, RANK with ties for top-N problems, LAG for sequence-based detection, SUM OVER for running balances, and the gap-and-island pattern (ROW_NUMBER subtracted from a date column to find runs of consecutive days). The frame clause comes up as a follow-up almost every time.

Are QUALIFY-style filters dialect-portable?+

No. QUALIFY filters on window function results without a CTE, but only Snowflake, BigQuery, Teradata, and a handful of others support it. Postgres, MySQL, and Redshift require the CTE form. The CTE form is the safe default; mention QUALIFY as the dialect-aware alternative when the interviewer asks about Snowflake or BigQuery specifically.

02 / Why practice

Drill the topic that decides the round

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Open the window functions bank

More SQL practice

SQL interview questions→

15 worked solutions across joins, CTEs, gaps-and-islands.

CTE practice→

Common Table Expressions and recursive queries in their own bank.

All SQL practice problems→

854 scored problems across every SQL topic.