124 window-function problems isolated from the data engineer SQL catalog. ROW_NUMBER for dedup, LAG and LEAD for deltas and sessionization, SUM OVER for running totals and account balances, ROWS versus RANGE frame clauses for rolling averages. Engineered ties and frame-clause traps appear in the multi-seed grader.

Window functions are roughly 15 percent of the SQL interview surface area for data engineer roles in 2026. They account for nearly 40 percent of failed submissions on multi-seed grading. The reason is that the failure modes are silent on a single fixture. The query produces "an" answer that looks correct, just not "the" answer the business wants. The 10-seed grader engineers the ties, the NULL distributions, and the timestamp duplicates that surface the silent bugs as actual failures.

Five window-function patterns compose this catalog. ROW_NUMBER for dedup uses PARTITION BY natural_key ORDER BY updated_at DESC, composite_tiebreaker, then filters rn equals 1. The composite tiebreaker after updated_at is what survives engineered ties; without it, two rows tied at the same updated_at produce nondeterministic results across the 10 seeds. Top-N per group uses PARTITION BY group ORDER BY metric DESC. The choice between RANK, DENSE_RANK, and ROW_NUMBER depends on whether the business wants ties skipped (RANK produces 1, 2, 2, 4), ties preserved without skipping (DENSE_RANK produces 1, 2, 2, 3), or arbitrary picks of exactly one per group (ROW_NUMBER). LAG and LEAD for deltas uses LAG(amount) OVER PARTITION BY user ORDER BY ts to compute transaction-over-transaction change, with NULLIF on the first row to avoid division by zero or NULL propagation. SUM OVER for running totals uses PARTITION BY account ORDER BY posted_at ROWS UNBOUNDED PRECEDING; defaulting to RANGE collapses ties at the same timestamp into one frame slot and produces silently wrong running totals on intraday data. AVG OVER with explicit ROWS BETWEEN frames for rolling averages handles the partial-window edge case for the first N-1 rows; data engineer interviewers explicitly ask about this.

The frame clause is the most-missed sub-topic. ROWS counts physical rows in the window. RANGE groups rows with equal ORDER BY values into a single frame slot. On daily revenue with one row per day, ROWS and RANGE behave identically. On intraday data with multiple events per timestamp, they diverge. The default in Postgres and most engines is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is rarely what the question wants. A data engineer should always specify ROWS explicitly for rolling averages. The 10-seed grader engineers timestamp ties on roughly half the rolling-average problems specifically to surface the ROWS-versus-RANGE bug.

Companies that test window functions heavily based on tagged problems: Meta (ranking with tie handling, gap-and-island for engagement streaks), Stripe (running balances on financial accounts), Airbnb (sessionization with LAG), Spotify (gap-and-island for listening streaks), Snowflake (QUALIFY-flavored variants), Databricks (PARTITION BY with multiple keys for fact-dimension joins). The Postgres-specific DISTINCT ON, the Snowflake-and-BigQuery QUALIFY clause, and the LATERAL join alternatives are tagged on the problems where each applies. QUALIFY filters the result of a window function the way HAVING filters the result of a GROUP BY; on engines without QUALIFY, wrap the window in a CTE and filter in the outer SELECT.

SQL Window Function Interview Questions

Window function interview problems with engineered ties and frame-clause traps for data engineer practice.

905 practice problems matching this filter. Difficulty: medium (413), hard (147), easy (345).

SQL (905)

Common questions

What is the difference between ROW_NUMBER, RANK, and DENSE_RANK in SQL window functions?
ROW_NUMBER assigns a unique sequential integer to each row in the partition; it never produces ties. RANK skips numbers when there are ties (1, 2, 2, 4). DENSE_RANK does not skip (1, 2, 2, 3). For 'top N per group including all ties at N', DENSE_RANK is almost always the right choice. For dedup-latest, ROW_NUMBER plus a composite tiebreaker in the ORDER BY clause.
When should a data engineer use LAG versus LEAD?
LAG accesses the previous row in the partition; LEAD accesses the next. Use LAG to compute deltas from previous values (LAG(amount) for transaction-over-transaction change), session start detection (LAG(event_ts) greater than 30 minutes ago means new session), or comparison to baseline. Use LEAD when looking forward, typically for predicting next event or detecting churn (LEAD(event_ts) IS NULL within a window means this was the user's last event).
What is the ROWS versus RANGE distinction in SQL window frames?
ROWS counts physical preceding and following rows. RANGE groups rows with equal ORDER BY values into a single frame slot. On daily data with one row per day, they behave identically. On intraday data with multiple events sharing a timestamp, ROWS gives you the physical N preceding rows; RANGE gives you all rows tied at the same ts plus N preceding distinct ts groups. Always specify ROWS explicitly for rolling averages. Defaulting to RANGE produces silent bugs.
What is the QUALIFY clause and which engines support it?
QUALIFY filters the result of a window function the way HAVING filters the result of a GROUP BY. Native in Snowflake, BigQuery, and Teradata. Not in Postgres, MySQL, or SQL Server. On engines without QUALIFY, the equivalent is a CTE with the window function in the SELECT and a WHERE on the outer query. Mention QUALIFY in interviews if the company is Snowflake or BigQuery; it is a data engineer fluency signal.
How do you handle ties when picking 'top 1 per group' in SQL?
ROW_NUMBER with a composite ORDER BY: ROW_NUMBER() OVER (PARTITION BY group ORDER BY metric DESC, tiebreaker ASC), filter rn equals 1. The tiebreaker must be a stable secondary sort that produces a deterministic single row. Without it, the query returns nondeterministic results across runs and fails the multi-seed grader. If the business wants 'all rows tied at top', use DENSE_RANK and filter dr equals 1.
What is a partial window and why does it matter for rolling averages?
A 7-day rolling average computed with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW averages over fewer than 7 rows for the first 6 days of data. Data engineer interviewers ask whether you want partial windows (return averages with caveats) or want to filter them out (WHERE rn greater-than-or-equal-to 7 in a wrapping CTE). State the choice. Picking one without acknowledging the other is the failure mode.
Can SQL window functions be nested?
No. You cannot compute a window function directly on top of another window function in the same SELECT. The workaround is a CTE: compute the first window in the CTE, then compute the second in the outer SELECT. Common pattern: rolling sum of a per-group ROW_NUMBER. Two CTEs, two windows, one outer query.