Practice SQL Queries (by Interview Pattern)

Practice organized by syntax wastes time. You drill JOINs, then you drill GROUP BY, then you drill window functions, and on interview day the question composes 3 of them in a way none of the drills did. The 854 queries here are filed by the shape of the answer, not the keyword used to write it. 8 shapes cover roughly 90 percent of interview SQL.

Practice organized by syntax wastes time. You drill JOINs, then you drill GROUP BY, then you drill window functions, and on interview day the question composes 3 of them in a way none of the drills did. The 854 queries here are filed by the shape of the answer, not the keyword used to write it. 8 shapes cover roughly 90 percent of interview SQL.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a problem
8
Interview-shape patterns
854
Graded queries
~90%
Of interview SQL covered by 8 patterns
10
Seeds per submission

Pattern frequency in interview SQL

Share of SQL interview questions per pattern, drawn from the catalog's source write-ups.

Share of interview SQL questions, by pattern
Top-N per group
18%
Dedup latest per key
14%
Conditional aggregation pivot
11%
Sessionization
10%
Running and rolling windows
9%
Gap and island
8%
Slowly changing dimensions
6%
Late-arriving reconciliation
5%
Everything else
19%

8 patterns with their skeleton query

Each pattern has a recognizable shape, a set of keywords that show up together, and a typical failure mode that costs candidates the round.

Top-N per group62 problems · 18% of interview SQL
ROW_NUMBERPARTITION BYORDER BYCTE
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY group_col
      ORDER BY rank_col DESC, tiebreak_col
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn <= N;
Asked atMeta, Stripe, Airbnb, Uber
Failure modeForgetting the tiebreaker after rank_col. The query passes 1 seed and fails any with engineered ties.
Dedup latest per key48 problems · 14% of interview SQL
ROW_NUMBERPARTITION BY natural_keyORDER BY updated_at DESC
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY natural_key
      ORDER BY updated_at DESC, id DESC
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn = 1;
Asked atStripe, Snowflake, Databricks, Netflix
Failure modeUsing RANK or DENSE_RANK instead of ROW_NUMBER. 2 rows tie at the top and you get 2 'latest' records.
Sessionization34 problems · 10% of interview SQL
LAGSUM OVERboolean accumulatorGROUP BY session_id
WITH gapped AS (
  SELECT user_id, event_at,
    CASE WHEN event_at - LAG(event_at) OVER (
      PARTITION BY user_id ORDER BY event_at) > INTERVAL '30 minutes'
    THEN 1 ELSE 0 END AS new_session
  FROM events
),
sessioned AS (
  SELECT *, SUM(new_session) OVER (
    PARTITION BY user_id ORDER BY event_at) AS session_id
  FROM gapped
)
SELECT user_id, session_id, MIN(event_at), MAX(event_at)
FROM sessioned
GROUP BY user_id, session_id;
Asked atMeta, Spotify, Airbnb, Pinterest
Failure modeTreating the first event's NULL LAG as 'new session = 0'. The first event always starts a new session; the COALESCE matters.
Gap and island28 problems · 8% of interview SQL
ROW_NUMBERdate arithmeticGROUP BY derived key
SELECT user_id,
       MIN(activity_date) AS streak_start,
       MAX(activity_date) AS streak_end,
       COUNT(*) AS streak_length
FROM (
  SELECT user_id, activity_date,
    activity_date - (ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY activity_date
    )) * INTERVAL '1 day' AS streak_key
  FROM user_activity
) t
GROUP BY user_id, streak_key
ORDER BY user_id, streak_start;
Asked atMeta, Google, DoorDash
Failure modeSubtracting an INT instead of an INTERVAL. The streak_key collides across users when dates are close to ROW_NUMBER values.
Slowly changing dimensions22 problems · 6% of interview SQL
half-open intervaleffective_fromeffective_toJOIN ON date BETWEEN
SELECT f.fact_id, f.event_time, d.attribute_value
FROM fact_table f
LEFT JOIN dim_scd2 d
  ON d.entity_id = f.entity_id
 AND d.effective_from <= f.event_time
 AND (d.effective_to IS NULL OR f.event_time < d.effective_to);
Asked atSnowflake, Databricks, Stripe, Square
Failure modeClosed-interval JOIN (<=, <=) instead of half-open (<=, <). At the boundary microsecond, 2 dim rows match and the fact doubles.
Conditional aggregation pivot40 problems · 11% of interview SQL
SUM CASE WHENFILTER (WHERE)GROUP BY
SELECT
  signup_month,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  COUNT(*) FILTER (WHERE status = 'churned') AS churned,
  100.0 * COUNT(*) FILTER (WHERE status = 'active') / COUNT(*) AS active_pct
FROM users
GROUP BY signup_month;
Asked atStripe, Block, Shopify, Salesforce
Failure modeUsing AVG(CASE WHEN ...) for a percentage. AVG ignores nulls in the denominator; the percent gets wrong silently.
Running and rolling windows38 problems · 9% of interview SQL
SUM OVER ORDER BYROWS BETWEENframe clause
SELECT
  account_id, posted_at, amount,
  SUM(amount) OVER (
    PARTITION BY account_id
    ORDER BY posted_at
    ROWS UNBOUNDED PRECEDING
  ) AS running_balance,
  AVG(amount) OVER (
    PARTITION BY account_id
    ORDER BY posted_at
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7
FROM transactions;
Asked atStripe, Block, Robinhood, Coinbase
Failure modeOmitting ROWS, defaulting to RANGE. 2 rows sharing the ORDER BY column collapse into 1 frame slot and the running sum becomes wrong.
Late-arriving and reconciliation18 problems · 5% of interview SQL
event_time vs processing_timeMERGEidempotent UPSERT
MERGE INTO daily_revenue d
USING (
  SELECT
    DATE(event_at) AS revenue_date,
    SUM(amount) AS revenue
  FROM events
  WHERE processed_at >= :since
  GROUP BY DATE(event_at)
) src
ON d.revenue_date = src.revenue_date
WHEN MATCHED THEN UPDATE SET revenue = d.revenue + src.revenue
WHEN NOT MATCHED THEN INSERT (revenue_date, revenue)
VALUES (src.revenue_date, src.revenue);
Asked atNetflix, Databricks, Uber, DoorDash
Failure modeReplacing the existing row instead of adding to it. Late events for an old day overwrite the existing day's total instead of correcting it.

How many queries to solve, by target level

Queries to solve and the patterns to focus on. Higher levels add patterns rather than increasing depth on existing ones.

TargetQueries to solvePatterns to coverSecondary focus
Phone screen (Easy/Medium)20-30Top-N, dedup, pivotGROUP BY + window function fluency
Mid-level onsite60-80Add running windows, gap-and-islandMulti-CTE composition, frame clauses
Senior onsite120-150Add SCD, sessionization, late-arrivingTradeoff articulation, NULL semantics
FAANG / staff180-220All 8 patterns, recursive CTEsPattern combination, optimization reasoning

3 practice plans that don't work

Common ways candidates burn time without building interview-relevant skill.

Drilling 50 GROUP BY problems first

Aggregation is the most common keyword in interview SQL but it's almost never the only one. Most aggregation questions are conditional aggregation pivots that also test CASE WHEN, NULL handling, and percent-of-total. Drill the pivot pattern, not GROUP BY.

Memorizing 'find the Nth highest salary' answers

There are at least 3 correct answers depending on what 'highest' means. The pattern is DENSE_RANK = N for the Nth distinct value, RANK = N for the Nth position with gaps, ROW_NUMBER = N if you want exactly 1 row. Picking which interpretation the question wants is the actual skill.

Studying by SQL keyword alphabetically

A practice plan that goes 'today: JOINs, tomorrow: GROUP BY, next week: window functions' wastes the highest-leverage time. The shape-based plan above hits the high-percentage patterns first and lets the keywords come along for the ride.

Pattern-based SQL practice FAQ

Why organize practice by pattern instead of by SQL keyword?+
Interview questions are formulated by the business problem, not the keywords the answer happens to use. 'Find the 3 highest-value orders per customer' is a problem. The keywords (ROW_NUMBER, PARTITION BY, ORDER BY, a CTE wrapper) are the means. Patterns map directly onto problem shapes; keywords don't.
How many SQL queries should I practice before an interview?+
Phone-screen ready is around 30 queries spread across the top 3 patterns (top-N per group, dedup, conditional pivot). Onsite-ready is 80 to 120 with running windows, sessionization, and gap-and-island added. FAANG-level is 150-plus with SCD lookups and late-arriving reconciliation. Pattern coverage drives the count, not raw volume.
What's the highest-frequency SQL interview question shape?+
Top-N per group, by a wide margin. About 18 percent of interview SQL questions in the catalog are some variant: top product per category, most recent order per customer, highest-paid employee per department. ROW_NUMBER inside a CTE is the canonical answer; the only variation is the tiebreaker.
How do I know which pattern a question is asking for?+
3 signals. First, the prompt's phrasing: 'most recent X per Y' means dedup, 'top N X per Y' means top-N, 'streak of consecutive X' means gap-and-island. Second, the schema: a fact table joined to a Type 2 dimension with effective_from / effective_to is SCD. Third, the expected output shape: 1 row per group is dedup or top-1, multiple rows per group is top-N or pivot.
Do these patterns transfer to BigQuery, Snowflake, and Redshift?+
Yes, with 1 or 2 syntax tweaks per pattern. The window function and CTE patterns transfer cleanly. The conditional aggregation pattern uses FILTER on Postgres and Snowflake; BigQuery uses COUNTIF or CASE WHEN. The MERGE pattern works on Snowflake, BigQuery, Postgres 15-plus, and Redshift; older Redshift needs an INSERT/UPDATE rewrite.
02 / Why practice

Pick a pattern and write a query

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Drill the high-frequency patterns