Practice SQL Queries (by Interview Pattern)

Q: Why organize practice by pattern instead of by SQL keyword?

Interview questions are formulated by the business problem, not the keywords the answer happens to use. 'Find the 3 highest-value orders per customer' is a problem. The keywords (ROW_NUMBER, PARTITION BY, ORDER BY, a CTE wrapper) are the means. Patterns map directly onto problem shapes; keywords don't.

Q: How many SQL queries should I practice before an interview?

Phone-screen ready is around 30 queries spread across the top 3 patterns (top-N per group, dedup, conditional pivot). Onsite-ready is 80 to 120 with running windows, sessionization, and gap-and-island added. FAANG-level is 150-plus with SCD lookups and late-arriving reconciliation. Pattern coverage drives the count, not raw volume.

Q: What's the highest-frequency SQL interview question shape?

Top-N per group, by a wide margin. About 18 percent of interview SQL questions in the catalog are some variant: top product per category, most recent order per customer, highest-paid employee per department. ROW_NUMBER inside a CTE is the canonical answer; the only variation is the tiebreaker.

Q: How do I know which pattern a question is asking for?

3 signals. First, the prompt's phrasing: 'most recent X per Y' means dedup, 'top N X per Y' means top-N, 'streak of consecutive X' means gap-and-island. Second, the schema: a fact table joined to a Type 2 dimension with effective_from / effective_to is SCD. Third, the expected output shape: 1 row per group is dedup or top-1, multiple rows per group is top-N or pivot.

Q: Do these patterns transfer to BigQuery, Snowflake, and Redshift?

Yes, with 1 or 2 syntax tweaks per pattern. The window function and CTE patterns transfer cleanly. The conditional aggregation pattern uses FILTER on Postgres and Snowflake; BigQuery uses COUNTIF or CASE WHEN. The MERGE pattern works on Snowflake, BigQuery, Postgres 15-plus, and Redshift; older Redshift needs an INSERT/UPDATE rewrite.

Practice organized by syntax wastes time. You drill JOINs, then you drill GROUP BY, then you drill window functions, and on interview day the question composes 3 of them in a way none of the drills did. The 854 queries here are filed by the shape of the answer, not the keyword used to write it. 8 shapes cover roughly 90 percent of interview SQL.

Open the catalog Random Medium

Prepare for the interview

01 / Open invite

02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1SELECT user_id,

2 COUNT(*) AS sessions

3FROM events

4WHERE ts >= NOW() - INTERVAL '7 day'

Execute your solution0.4s avg.

MicrosoftInterview question

Solve a problem

Interview-shape patterns

854

Graded queries

~90%

Of interview SQL covered by 8 patterns

Seeds per submission

Pattern frequency in interview SQL

Share of SQL interview questions per pattern, drawn from the catalog's source write-ups.

Share of interview SQL questions, by pattern

Top-N per group

18%

Dedup latest per key

14%

Conditional aggregation pivot

11%

Sessionization

10%

Running and rolling windows

Gap and island

Slowly changing dimensions

Late-arriving reconciliation

Everything else

19%

8 patterns with their skeleton query

Each pattern has a recognizable shape, a set of keywords that show up together, and a typical failure mode that costs candidates the round.

Top-N per group62 problems · 18% of interview SQL

ROW_NUMBERPARTITION BYORDER BYCTE

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY group_col
      ORDER BY rank_col DESC, tiebreak_col
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn <= N;

Asked atMeta, Stripe, Airbnb, Uber

Failure modeForgetting the tiebreaker after rank_col. The query passes 1 seed and fails any with engineered ties.

Dedup latest per key48 problems · 14% of interview SQL

ROW_NUMBERPARTITION BY natural_keyORDER BY updated_at DESC

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY natural_key
      ORDER BY updated_at DESC, id DESC
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn = 1;

Asked atStripe, Snowflake, Databricks, Netflix

Failure modeUsing RANK or DENSE_RANK instead of ROW_NUMBER. 2 rows tie at the top and you get 2 'latest' records.

Sessionization34 problems · 10% of interview SQL

LAGSUM OVERboolean accumulatorGROUP BY session_id

WITH gapped AS (
  SELECT user_id, event_at,
    CASE WHEN event_at - LAG(event_at) OVER (
      PARTITION BY user_id ORDER BY event_at) > INTERVAL '30 minutes'
    THEN 1 ELSE 0 END AS new_session
  FROM events
),
sessioned AS (
  SELECT *, SUM(new_session) OVER (
    PARTITION BY user_id ORDER BY event_at) AS session_id
  FROM gapped
)
SELECT user_id, session_id, MIN(event_at), MAX(event_at)
FROM sessioned
GROUP BY user_id, session_id;

Asked atMeta, Spotify, Airbnb, Pinterest

Failure modeTreating the first event's NULL LAG as 'new session = 0'. The first event always starts a new session; the COALESCE matters.

Gap and island28 problems · 8% of interview SQL

ROW_NUMBERdate arithmeticGROUP BY derived key

SELECT user_id,
       MIN(activity_date) AS streak_start,
       MAX(activity_date) AS streak_end,
       COUNT(*) AS streak_length
FROM (
  SELECT user_id, activity_date,
    activity_date - (ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY activity_date
    )) * INTERVAL '1 day' AS streak_key
  FROM user_activity
) t
GROUP BY user_id, streak_key
ORDER BY user_id, streak_start;

Asked atMeta, Google, DoorDash

Failure modeSubtracting an INT instead of an INTERVAL. The streak_key collides across users when dates are close to ROW_NUMBER values.

Slowly changing dimensions22 problems · 6% of interview SQL

half-open intervaleffective_fromeffective_toJOIN ON date BETWEEN

SELECT f.fact_id, f.event_time, d.attribute_value
FROM fact_table f
LEFT JOIN dim_scd2 d
  ON d.entity_id = f.entity_id
 AND d.effective_from <= f.event_time
 AND (d.effective_to IS NULL OR f.event_time < d.effective_to);

Asked atSnowflake, Databricks, Stripe, Square

Failure modeClosed-interval JOIN (<=, <=) instead of half-open (<=, <). At the boundary microsecond, 2 dim rows match and the fact doubles.

Conditional aggregation pivot40 problems · 11% of interview SQL

SUM CASE WHENFILTER (WHERE)GROUP BY

SELECT
  signup_month,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  COUNT(*) FILTER (WHERE status = 'churned') AS churned,
  100.0 * COUNT(*) FILTER (WHERE status = 'active') / COUNT(*) AS active_pct
FROM users
GROUP BY signup_month;

Asked atStripe, Block, Shopify, Salesforce

Failure modeUsing AVG(CASE WHEN ...) for a percentage. AVG ignores nulls in the denominator; the percent gets wrong silently.

Running and rolling windows38 problems · 9% of interview SQL

SUM OVER ORDER BYROWS BETWEENframe clause

SELECT
  account_id, posted_at, amount,
  SUM(amount) OVER (
    PARTITION BY account_id
    ORDER BY posted_at
    ROWS UNBOUNDED PRECEDING
  ) AS running_balance,
  AVG(amount) OVER (
    PARTITION BY account_id
    ORDER BY posted_at
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7
FROM transactions;

Asked atStripe, Block, Robinhood, Coinbase

Failure modeOmitting ROWS, defaulting to RANGE. 2 rows sharing the ORDER BY column collapse into 1 frame slot and the running sum becomes wrong.

Late-arriving and reconciliation18 problems · 5% of interview SQL

event_time vs processing_timeMERGEidempotent UPSERT

MERGE INTO daily_revenue d
USING (
  SELECT
    DATE(event_at) AS revenue_date,
    SUM(amount) AS revenue
  FROM events
  WHERE processed_at >= :since
  GROUP BY DATE(event_at)
) src
ON d.revenue_date = src.revenue_date
WHEN MATCHED THEN UPDATE SET revenue = d.revenue + src.revenue
WHEN NOT MATCHED THEN INSERT (revenue_date, revenue)
VALUES (src.revenue_date, src.revenue);

Asked atNetflix, Databricks, Uber, DoorDash

Failure modeReplacing the existing row instead of adding to it. Late events for an old day overwrite the existing day's total instead of correcting it.

How many queries to solve, by target level

Queries to solve and the patterns to focus on. Higher levels add patterns rather than increasing depth on existing ones.

Target	Queries to solve	Patterns to cover	Secondary focus
Phone screen (Easy/Medium)	20-30	Top-N, dedup, pivot	GROUP BY + window function fluency
Mid-level onsite	60-80	Add running windows, gap-and-island	Multi-CTE composition, frame clauses
Senior onsite	120-150	Add SCD, sessionization, late-arriving	Tradeoff articulation, NULL semantics
FAANG / staff	180-220	All 8 patterns, recursive CTEs	Pattern combination, optimization reasoning

3 practice plans that don't work

Common ways candidates burn time without building interview-relevant skill.

Drilling 50 GROUP BY problems first

Aggregation is the most common keyword in interview SQL but it's almost never the only one. Most aggregation questions are conditional aggregation pivots that also test CASE WHEN, NULL handling, and percent-of-total. Drill the pivot pattern, not GROUP BY.

Memorizing 'find the Nth highest salary' answers

There are at least 3 correct answers depending on what 'highest' means. The pattern is DENSE_RANK = N for the Nth distinct value, RANK = N for the Nth position with gaps, ROW_NUMBER = N if you want exactly 1 row. Picking which interpretation the question wants is the actual skill.

Studying by SQL keyword alphabetically

A practice plan that goes 'today: JOINs, tomorrow: GROUP BY, next week: window functions' wastes the highest-leverage time. The shape-based plan above hits the high-percentage patterns first and lets the keywords come along for the ride.

Pattern-based SQL practice FAQ

Why organize practice by pattern instead of by SQL keyword?+

Interview questions are formulated by the business problem, not the keywords the answer happens to use. 'Find the 3 highest-value orders per customer' is a problem. The keywords (ROW_NUMBER, PARTITION BY, ORDER BY, a CTE wrapper) are the means. Patterns map directly onto problem shapes; keywords don't.

How many SQL queries should I practice before an interview?+

Phone-screen ready is around 30 queries spread across the top 3 patterns (top-N per group, dedup, conditional pivot). Onsite-ready is 80 to 120 with running windows, sessionization, and gap-and-island added. FAANG-level is 150-plus with SCD lookups and late-arriving reconciliation. Pattern coverage drives the count, not raw volume.

What's the highest-frequency SQL interview question shape?+

Top-N per group, by a wide margin. About 18 percent of interview SQL questions in the catalog are some variant: top product per category, most recent order per customer, highest-paid employee per department. ROW_NUMBER inside a CTE is the canonical answer; the only variation is the tiebreaker.

How do I know which pattern a question is asking for?+

3 signals. First, the prompt's phrasing: 'most recent X per Y' means dedup, 'top N X per Y' means top-N, 'streak of consecutive X' means gap-and-island. Second, the schema: a fact table joined to a Type 2 dimension with effective_from / effective_to is SCD. Third, the expected output shape: 1 row per group is dedup or top-1, multiple rows per group is top-N or pivot.

Do these patterns transfer to BigQuery, Snowflake, and Redshift?+

Yes, with 1 or 2 syntax tweaks per pattern. The window function and CTE patterns transfer cleanly. The conditional aggregation pattern uses FILTER on Postgres and Snowflake; BigQuery uses COUNTIF or CASE WHEN. The MERGE pattern works on Snowflake, BigQuery, Postgres 15-plus, and Redshift; older Redshift needs an INSERT/UPDATE rewrite.

02 / Why practice

Pick a pattern and write a query

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Open a top-N problem

Drill the high-frequency patterns

Window Functions→

ROW_NUMBER, RANK, LAG, LEAD, frame clauses. Powers the top-N, dedup, and running-window patterns.

GROUP BY Practice→

Aggregation, HAVING, conditional aggregation with CASE WHEN and FILTER.

CTE Practice→

Multi-step transformations and recursive CTEs for hierarchies.