SQL Window Functions Practice Problems
10 window function problems, each with a stated pattern, a worked solution, and the specific seed designed to break the lazy answer. Window functions are the SQL surface most candidates wing into Medium-Hard interviews undertrained for; this page lets you fail safely on the patterns first.
10 window function problems, each with a stated pattern, a worked solution, and the specific seed designed to break the lazy answer. Window functions are the SQL surface most candidates wing into Medium-Hard interviews undertrained for; this page lets you fail safely on the patterns first.
Know window functions the way the interviewer who asks it knows it.
Ranking function divergence on ties
The output difference is the entire interview interest.
The frame clause that breaks running totals
Default ORDER BY without ROWS uses RANGE, which collapses ties. The number of candidates burned by this is not small.
The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present, and ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when ORDER BY is absent. Almost no candidate states this aloud and almost every senior interviewer probes for it.
-- 2 orders on 2026-01-15
SELECT
ordered_at,
amount,
SUM(amount) OVER (
ORDER BY ordered_at
) AS running
FROM orders;
-- ordered_at amount running
-- 2026-01-15 50 150 <- wrong!
-- 2026-01-15 100 150 <- collapsed
-- 2026-01-16 30 180-- same 2 orders on 2026-01-15
SELECT
ordered_at,
amount,
SUM(amount) OVER (
ORDER BY ordered_at, order_id
ROWS UNBOUNDED PRECEDING
) AS running
FROM orders;
-- ordered_at amount running
-- 2026-01-15 50 50
-- 2026-01-15 100 150
-- 2026-01-16 30 1802 rows tied on ordered_at are collapsed into 1 frame slot under the default RANGE behavior. The running total computes against the whole group instead of incrementally. Switch to ROWS and add a tiebreaker; the bug disappears.
10 practice problems
Each lists the pattern, the function family, the schema, the prompt, the seed designed to break naive answers, and the skeleton solution.
Return the most recent order per customer. Keep only 1 row per customer even if 2 orders share the same ordered_at timestamp.
Seed 4 has 2 orders for customer 7 with identical ordered_at to the microsecond. Without an order_id tiebreaker, the row returned varies across seeds.
SELECT customer_id, order_id, ordered_at, total
FROM (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY ordered_at DESC, order_id DESC
) AS rn
FROM orders
) t
WHERE rn = 1;Return the top 3 distinct salary values per department, plus the employees who earn each. If 2 employees tie at the highest salary, both count toward the top 1.
Seed 2 has 3 engineers tied at $180k. Using ROW_NUMBER returns 1 row at rank 1; using RANK skips rank 2 and rank 3. DENSE_RANK is the correct choice.
SELECT department, emp_id, salary
FROM (
SELECT *, DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS drnk
FROM employees
) t
WHERE drnk <= 3
ORDER BY department, salary DESC, emp_id;For each date, return revenue, prior-day revenue, and the day-over-day delta. The first date should return NULL for prior and delta.
Seed 7 has a gap (no row for 2026-01-19). LAG returns the prior existing row (2026-01-18), which is correct only if the question is 'previous record' rather than 'previous calendar day'. Read the prompt.
SELECT revenue_date, revenue, LAG(revenue) OVER (ORDER BY revenue_date) AS prior, revenue - LAG(revenue) OVER (ORDER BY revenue_date) AS delta FROM daily_revenue ORDER BY revenue_date;
For each transaction, return account_id, posted_at, amount, and the account's running balance up to and including this transaction.
Seed 1 has 2 transactions on the same account at the same posted_at. Default RANGE frame collapses them into 1 frame slot; the running balance reports the same value for both rows.
SELECT account_id, posted_at, amount,
SUM(amount) OVER (
PARTITION BY account_id
ORDER BY posted_at, txn_id
ROWS UNBOUNDED PRECEDING
) AS running
FROM transactions;Return a 7-day rolling average of signups for each date (including the date and the prior 6). For dates with fewer than 7 days of history, return the partial average.
Seed 5 has the table starting at 2026-01-01. Without explicit ROWS BETWEEN 6 PRECEDING AND CURRENT ROW, the default frame includes all prior rows and the early dates inflate.
SELECT signup_date, signups,
AVG(signups::numeric) OVER (
ORDER BY signup_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d
FROM daily_signups
ORDER BY signup_date;For each category, return the top 5 products by total revenue. Aggregate first, rank second.
Seed 8 has products that exist in multiple categories. Ranking on per-row revenue instead of per-product total revenue returns wildly wrong rows.
WITH product_totals AS (
SELECT category, product, SUM(revenue) AS total_rev
FROM sales
GROUP BY category, product
),
ranked AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY category ORDER BY total_rev DESC, product
) AS rn
FROM product_totals
)
SELECT category, product, total_rev
FROM ranked WHERE rn <= 5
ORDER BY category, rn;Find the longest streak of consecutive active days per user. Return user_id, streak_start, streak_end, streak_length.
Seed 3 has a user with 2 streaks of equal max length. Returning only the first one fails the grader; the prompt says 'longest', not 'first longest'.
WITH grouped AS (
SELECT user_id, activity_date,
activity_date - (ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY activity_date)) * INTERVAL '1 day' AS streak_key
FROM user_activity
),
streaks AS (
SELECT user_id, streak_key,
MIN(activity_date) AS start_d,
MAX(activity_date) AS end_d,
COUNT(*) AS length
FROM grouped
GROUP BY user_id, streak_key
),
ranked AS (
SELECT *, RANK() OVER (
PARTITION BY user_id ORDER BY length DESC
) AS r
FROM streaks
)
SELECT user_id, start_d, end_d, length
FROM ranked WHERE r = 1
ORDER BY user_id, start_d;Group events into sessions where any gap longer than 30 minutes starts a new session. Return user_id, session_id, session_start, session_end, event_count.
Seed 0 has a user whose very first event has NULL LAG. Without COALESCE, the first event isn't marked as a new session and session_id stays 0.
WITH gapped AS (
SELECT user_id, event_at,
CASE WHEN event_at - LAG(event_at) OVER (
PARTITION BY user_id ORDER BY event_at) > INTERVAL '30 minutes'
OR LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at) IS NULL
THEN 1 ELSE 0 END AS new_session
FROM events
),
sessioned AS (
SELECT *, SUM(new_session) OVER (
PARTITION BY user_id ORDER BY event_at
) AS session_id
FROM gapped
)
SELECT user_id, session_id,
MIN(event_at) AS session_start,
MAX(event_at) AS session_end,
COUNT(*) AS event_count
FROM sessioned
GROUP BY user_id, session_id
ORDER BY user_id, session_start;For each order, return customer_id, ordered_at, total, the customer's first-ever order total, and the customer's last-ever order total. All on the same row.
Seed 6 punishes the bare LAST_VALUE. The default frame ends at CURRENT ROW, so LAST_VALUE returns the current order's total instead of the customer's final order. Needs ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
SELECT customer_id, ordered_at, total, FIRST_VALUE(total) OVER w AS first_ever, LAST_VALUE(total) OVER w AS last_ever FROM orders WINDOW w AS ( PARTITION BY customer_id ORDER BY ordered_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ORDER BY customer_id, ordered_at;
For each product, return its share of its category's revenue as a percentage rounded to 2 decimals.
Seed 9 has a category with 1 product whose revenue is zero. Naive division returns NULL or zero where the expected value is 0.00 (or NULL with an explicit NULLIF). Handle the divide-by-zero.
SELECT category, product, revenue,
ROUND(
100.0 * revenue::numeric / NULLIF(SUM(revenue) OVER (
PARTITION BY category
), 0),
2
) AS share_pct
FROM sales
ORDER BY category, share_pct DESC NULLS LAST;Window function frequency in DE interviews
Share of SQL interview questions per function, drawn from interview write-ups. Drill the high-frequency functions first.
The top 5 functions account for roughly 80% of window function interview surface area. Memorizing NTILE or LAST_VALUE before LAG and SUM OVER is a misallocation.
Window function FAQ
Why can't I use a window function inside WHERE?+
What's the difference between ROWS and RANGE in a frame clause?+
When does ROW_NUMBER vs RANK vs DENSE_RANK matter?+
How do I sessionize event data without joins?+
When should I use QUALIFY?+
Are window functions worth practicing if I'm interviewing for an analyst role?+
Open problem 1 (dedup latest)
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition