SQL Window Functions Practice Problems

10 window function problems, each with a stated pattern, a worked solution, and the specific seed designed to break the lazy answer. Window functions are the SQL surface most candidates wing into Medium-Hard interviews undertrained for; this page lets you fail safely on the patterns first.

Open problem 1 Read the reference first

Practice problems

21%

PARTITION BY frequency in DE SQL

Patterns covered (dedup, top-N, running, etc.)

Postgres 16

Live execution engine

Ranking function divergence on ties

The output difference is the entire interview interest.

Same input. 3 ranking functions. Same ORDER BY score DESC.

score

ROW_NUMBER

RANK

DENSE_RANK

3 rows tied at 92. ROW_NUMBER picks an arbitrary order, RANK leaves a gap before 5, DENSE_RANK numbers tighten to 3. Pick the wrong one and your "top 3" might be 5 rows or 0 rows.

The frame clause that breaks running totals

Default ORDER BY without ROWS uses RANGE, which collapses ties. The number of candidates burned by this is not small.

The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present, and ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when ORDER BY is absent. Almost no candidate states this aloud and almost every senior interviewer probes for it.

RANGE (default)

-- 2 orders on 2026-01-15
SELECT
  ordered_at,
  amount,
  SUM(amount) OVER (
    ORDER BY ordered_at
  ) AS running
FROM orders;

-- ordered_at   amount   running
-- 2026-01-15      50      150  <- wrong!
-- 2026-01-15     100      150  <- collapsed
-- 2026-01-16      30      180

ROWS (explicit)

-- same 2 orders on 2026-01-15
SELECT
  ordered_at,
  amount,
  SUM(amount) OVER (
    ORDER BY ordered_at, order_id
    ROWS UNBOUNDED PRECEDING
  ) AS running
FROM orders;

-- ordered_at   amount   running
-- 2026-01-15      50       50
-- 2026-01-15     100      150
-- 2026-01-16      30      180

2 rows tied on ordered_at are collapsed into 1 frame slot under the default RANGE behavior. The running total computes against the whole group instead of incrementally. Switch to ROWS and add a tiebreaker; the bug disappears.

10 practice problems

Each lists the pattern, the function family, the schema, the prompt, the seed designed to break naive answers, and the skeleton solution.

Dedup latest per key

ROW_NUMBER

Easy · ~8 min

Schema

orders(order_id, customer_id, ordered_at, total)

Prompt

Return the most recent order per customer. Keep only 1 row per customer even if 2 orders share the same ordered_at timestamp.

Seed that breaks naive answers

Seed 4 has 2 orders for customer 7 with identical ordered_at to the microsecond. Without an order_id tiebreaker, the row returned varies across seeds.

SELECT customer_id, order_id, ordered_at, total
FROM (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY customer_id
    ORDER BY ordered_at DESC, order_id DESC
  ) AS rn
  FROM orders
) t
WHERE rn = 1;

Top-N per group with ties

DENSE_RANK

Easy · ~10 min

Schema

employees(emp_id, department, salary)

Prompt

Return the top 3 distinct salary values per department, plus the employees who earn each. If 2 employees tie at the highest salary, both count toward the top 1.

Seed that breaks naive answers

Seed 2 has 3 engineers tied at $180k. Using ROW_NUMBER returns 1 row at rank 1; using RANK skips rank 2 and rank 3. DENSE_RANK is the correct choice.

SELECT department, emp_id, salary
FROM (
  SELECT *, DENSE_RANK() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS drnk
  FROM employees
) t
WHERE drnk <= 3
ORDER BY department, salary DESC, emp_id;

Period-over-period delta

LAG

Easy · ~10 min

Schema

daily_revenue(revenue_date, revenue)

Prompt

For each date, return revenue, prior-day revenue, and the day-over-day delta. The first date should return NULL for prior and delta.

Seed that breaks naive answers

Seed 7 has a gap (no row for 2026-01-19). LAG returns the prior existing row (2026-01-18), which is correct only if the question is 'previous record' rather than 'previous calendar day'. Read the prompt.

SELECT
  revenue_date,
  revenue,
  LAG(revenue) OVER (ORDER BY revenue_date) AS prior,
  revenue - LAG(revenue) OVER (ORDER BY revenue_date) AS delta
FROM daily_revenue
ORDER BY revenue_date;

Running total with frame

SUM OVER

Medium · ~12 min

Schema

transactions(account_id, posted_at, amount)

Prompt

For each transaction, return account_id, posted_at, amount, and the account's running balance up to and including this transaction.

Seed that breaks naive answers

Seed 1 has 2 transactions on the same account at the same posted_at. Default RANGE frame collapses them into 1 frame slot; the running balance reports the same value for both rows.

SELECT account_id, posted_at, amount,
  SUM(amount) OVER (
    PARTITION BY account_id
    ORDER BY posted_at, txn_id
    ROWS UNBOUNDED PRECEDING
  ) AS running
FROM transactions;

Rolling window

AVG OVER

Medium · ~12 min

Schema

daily_signups(signup_date, signups)

Prompt

Return a 7-day rolling average of signups for each date (including the date and the prior 6). For dates with fewer than 7 days of history, return the partial average.

Seed that breaks naive answers

Seed 5 has the table starting at 2026-01-01. Without explicit ROWS BETWEEN 6 PRECEDING AND CURRENT ROW, the default frame includes all prior rows and the early dates inflate.

SELECT signup_date, signups,
  AVG(signups::numeric) OVER (
    ORDER BY signup_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d
FROM daily_signups
ORDER BY signup_date;

Top-N per group composed

ROW_NUMBER + GROUP BY

Medium · ~15 min

Schema

sales(sale_id, category, product, revenue)

Prompt

For each category, return the top 5 products by total revenue. Aggregate first, rank second.

Seed that breaks naive answers

Seed 8 has products that exist in multiple categories. Ranking on per-row revenue instead of per-product total revenue returns wildly wrong rows.

WITH product_totals AS (
  SELECT category, product, SUM(revenue) AS total_rev
  FROM sales
  GROUP BY category, product
),
ranked AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY category ORDER BY total_rev DESC, product
  ) AS rn
  FROM product_totals
)
SELECT category, product, total_rev
FROM ranked WHERE rn <= 5
ORDER BY category, rn;

Gap and island

ROW_NUMBER + GROUP BY derived key

Medium-Hard · ~20 min

Schema

user_activity(user_id, activity_date)

Prompt

Find the longest streak of consecutive active days per user. Return user_id, streak_start, streak_end, streak_length.

Seed that breaks naive answers

Seed 3 has a user with 2 streaks of equal max length. Returning only the first one fails the grader; the prompt says 'longest', not 'first longest'.

WITH grouped AS (
  SELECT user_id, activity_date,
    activity_date - (ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY activity_date)) * INTERVAL '1 day' AS streak_key
  FROM user_activity
),
streaks AS (
  SELECT user_id, streak_key,
    MIN(activity_date) AS start_d,
    MAX(activity_date) AS end_d,
    COUNT(*) AS length
  FROM grouped
  GROUP BY user_id, streak_key
),
ranked AS (
  SELECT *, RANK() OVER (
    PARTITION BY user_id ORDER BY length DESC
  ) AS r
  FROM streaks
)
SELECT user_id, start_d, end_d, length
FROM ranked WHERE r = 1
ORDER BY user_id, start_d;

Sessionization

LAG + SUM accumulator + GROUP BY

Hard · ~25 min

Schema

events(user_id, event_at)

Prompt

Group events into sessions where any gap longer than 30 minutes starts a new session. Return user_id, session_id, session_start, session_end, event_count.

Seed that breaks naive answers

Seed 0 has a user whose very first event has NULL LAG. Without COALESCE, the first event isn't marked as a new session and session_id stays 0.

WITH gapped AS (
  SELECT user_id, event_at,
    CASE WHEN event_at - LAG(event_at) OVER (
      PARTITION BY user_id ORDER BY event_at) > INTERVAL '30 minutes'
      OR LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at) IS NULL
    THEN 1 ELSE 0 END AS new_session
  FROM events
),
sessioned AS (
  SELECT *, SUM(new_session) OVER (
    PARTITION BY user_id ORDER BY event_at
  ) AS session_id
  FROM gapped
)
SELECT user_id, session_id,
  MIN(event_at) AS session_start,
  MAX(event_at) AS session_end,
  COUNT(*) AS event_count
FROM sessioned
GROUP BY user_id, session_id
ORDER BY user_id, session_start;

LAST_VALUE frame trap

FIRST_VALUE / LAST_VALUE

Hard · ~15 min

Schema

orders(customer_id, order_id, ordered_at, total)

Prompt

For each order, return customer_id, ordered_at, total, the customer's first-ever order total, and the customer's last-ever order total. All on the same row.

Seed that breaks naive answers

Seed 6 punishes the bare LAST_VALUE. The default frame ends at CURRENT ROW, so LAST_VALUE returns the current order's total instead of the customer's final order. Needs ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

SELECT customer_id, ordered_at, total,
  FIRST_VALUE(total) OVER w AS first_ever,
  LAST_VALUE(total) OVER w AS last_ever
FROM orders
WINDOW w AS (
  PARTITION BY customer_id
  ORDER BY ordered_at
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY customer_id, ordered_at;

Share of partition

SUM OVER PARTITION BY

Hard · ~15 min

Schema

sales(category, product, revenue)

Prompt

For each product, return its share of its category's revenue as a percentage rounded to 2 decimals.

Seed that breaks naive answers

Seed 9 has a category with 1 product whose revenue is zero. Naive division returns NULL or zero where the expected value is 0.00 (or NULL with an explicit NULLIF). Handle the divide-by-zero.

SELECT category, product, revenue,
  ROUND(
    100.0 * revenue::numeric / NULLIF(SUM(revenue) OVER (
      PARTITION BY category
    ), 0),
    2
  ) AS share_pct
FROM sales
ORDER BY category, share_pct DESC NULLS LAST;

Window function frequency in DE interviews

Share of SQL interview questions per function, drawn from interview write-ups. Drill the high-frequency functions first.

ROW_NUMBER

6.2% · 38 probs

RANK

4.9% · 19 probs

DENSE_RANK

4.9% · 22 probs

LAG

5.1% · 28 probs

LEAD

3.8% · 11 probs

SUM OVER

4.4% · 24 probs

AVG OVER

2.6% · 14 probs

FIRST_VALUE

1.8% · 6 probs

LAST_VALUE

1.2% · 4 probs

NTILE

0.9% · 5 probs

The top 5 functions account for roughly 80% of window function interview surface area. Memorizing NTILE or LAST_VALUE before LAG and SUM OVER is a misallocation.

Window function FAQ

Why can't I use a window function inside WHERE?+

Window functions run after WHERE in the logical query order. Filter on a window function result by wrapping the SELECT in a CTE or subquery and applying the filter outside. Snowflake and BigQuery support QUALIFY as a 1-liner; Postgres doesn't.

What's the difference between ROWS and RANGE in a frame clause?+

ROWS counts physical rows in the partition. RANGE groups rows that tie on the ORDER BY expression into the same frame slot. When 2 rows share an ORDER BY value, RANGE collapses them and ROWS doesn't. The default frame is RANGE; the bug pattern is omitting ROWS and getting collapsed totals.

When does ROW_NUMBER vs RANK vs DENSE_RANK matter?+

Only when ties exist on the ORDER BY column. Use ROW_NUMBER when you want exactly 1 row per group. Use RANK when ties are meaningful and gaps after them are acceptable. Use DENSE_RANK when you want the Nth distinct value. Picking the wrong one is a silent bug; the query runs.

How do I sessionize event data without joins?+

3 steps. LAG the prior event timestamp inside the same user partition. CASE WHEN the gap exceeds the threshold to mark a new session as 1, else 0. SUM that boolean as a running sum partitioned by user, ordered by event time. The running sum is the session ID. GROUP BY user_id and session_id for per-session aggregates.

When should I use QUALIFY?+

On Snowflake or BigQuery, when you only need to filter on a window function result and the CTE wrapper feels like ceremony. SELECT * FROM t QUALIFY ROW_NUMBER() OVER (...) = 1 reads cleaner than the CTE pattern. Don't use it on Postgres (unsupported) and don't use it when you also want to filter on a column.

Are window functions worth practicing if I'm interviewing for an analyst role?+

Yes for senior analyst rounds, qualified for junior. Mid-tier analytics interviews lean on GROUP BY and JOIN; senior analytics and most data engineer rounds expect fluent window functions. The top-N per group pattern alone shows up in 18 percent of analytics interview SQL questions, and the answer almost always involves ROW_NUMBER.

02 / Why practice

Open problem 1 (dedup latest)

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Open problem 1

Adjacent topics

Window Functions Reference→

Full reference: syntax, frame defaults, ranking semantics, QUALIFY.

Rank Practice→

5 problems focused on RANK, DENSE_RANK, and ROW_NUMBER divergence.

ORDER BY Practice→

NULL placement, deterministic tiebreakers, and stable pagination.