SQL Interview Questions

SQL Interview Questions

Live-executed SQL interview questions for data engineer and data scientist roles, with 10-seed grading and a Postgres editor.

922 SQL interview questions pulled from data engineer and data scientist interview reports. Every query submits to a live Postgres 16 process server-side. The grader replays your query against 10 randomized seeds, so a query that hard-codes IDs or omits an ORDER BY tiebreaker fails the same way a code reviewer would catch it.

SQL appears in 95 percent of data engineer interview loops, 85 percent of analytics engineer loops, and 80 percent of data scientist loops. The shape is consistent across companies. A small schema with 2 to 5 tables. A business question phrased ambiguously, the kind a product manager would ask in Slack. Fifteen to thirty minutes to write a working query. A follow-up about NULL handling, ties, or window-frame semantics. The 922-question catalog on this page is organized around the patterns that surface in those rounds.

Eight patterns cover 90 percent of data engineer interview SQL. Top-N per group with ROW_NUMBER, RANK, or DENSE_RANK. Dedup-latest by natural key with a composite ORDER BY tiebreaker. Sessionization with LAG and SUM OVER for boolean accumulator construction. Gap-and-island via date subtraction from ROW_NUMBER. Slowly changing dimension joins with half-open intervals where effective_from is less-than-or-equal to event_time and effective_to is greater-than event_time. Conditional aggregation with FILTER (WHERE) on engines that support it or SUM(CASE WHEN) for portability. Running and rolling windows with explicit ROWS BETWEEN frame clauses. Late-arriving reconciliation with MERGE on a composite natural key like (event_id, source).

Multi-seed grading is the part of this catalog that separates it from LeetCode SQL, HackerRank SQL, and sql-practice.com. Each seed engineers ties at the top of a ranking, NULL distributions across nullable columns, and cardinality skew on join keys. A query that selects the customer with the highest lifetime spend using ORDER BY SUM(amount) DESC LIMIT 1 passes one fixture and fails any seed where two customers tie. The submission log returns 7 out of 10 passing, names the missing tiebreaker on the failing seeds, and the data engineer rewrites with ROW_NUMBER plus a composite ORDER BY.

Postgres 16 is the practice dialect. About 85 percent of the catalog is portable across Snowflake, BigQuery, Redshift, and MySQL 8. The Postgres-specific syntax (DISTINCT ON, FILTER WHERE, RANGE BETWEEN INTERVAL, lateral joins) is tagged on each problem. The Snowflake-and-BigQuery-native QUALIFY clause and the Hive/Presto UNNEST patterns also appear as tagged variants. Companies represented through tagged questions include Meta, Amazon, Google, Netflix, Stripe, Airbnb, Uber, DoorDash, Snowflake, Databricks, Robinhood, Block, Spotify, and Coinbase. EXPLAIN ANALYZE is available on every passing submission for the optimization round that data engineer candidates see at L5 and above.

Twenty SQL interview questions, ranked by interview frequency and the seniority at which each lands, with the approach, the typical wrong answer, and the follow-up the interviewer asks next. L3 warm-ups: combine two tables and decide INNER versus LEFT JOIN when the right side can be missing; GROUP BY with HAVING to filter aggregates rather than rows. L4 core, the bulk of every loop: deduplicate to the latest row per natural key with ROW_NUMBER OVER PARTITION BY key ORDER BY updated_at DESC plus a composite tiebreaker (the wrong answer uses MAX(updated_at) in a self-join and double-counts ties); chain CTEs to stage a multi-step transformation; conditional aggregation with SUM(CASE WHEN ...) where the trap is AVG of a CASE silently dropping NULLs from the denominator; gaps-and-islands for consecutive-active-day streaks via date minus ROW_NUMBER; top-N per group where DENSE_RANK keeps ties and ROW_NUMBER forces exactly one. L5 senior, defended out loud: LAG and LEAD for transaction-over-transaction deltas and sessionization with a 30-minute inactivity gap; retention cohorts that join a signup cohort to later activity on a half-open date window; NULL traps where NOT IN against a nullable subquery silently returns empty and NOT EXISTS is the safe rewrite; late-arriving and out-of-order event handling with event_time versus ingest_time. L6 staff, not always asked but rewarding: read an EXPLAIN plan and name whether the full table scan is a missing index, a function call in WHERE blocking pushdown, or a poor join order; surface silent data-quality regressions across a fact table with a two-week rolling baseline on row count, per-column null rate, and key cardinality, flagging drift beyond three standard deviations.

How the round actually flows. A 45-minute SQL round opens with one Easy or Medium warm-up to get the candidate typing, then moves into one Medium-to-Hard problem where the follow-ups push on edge cases, dialect choices, and runtime cost. The follow-ups carry as much weight as the initial query: volunteering that NULLIF prevents the first-row divide-by-zero before being asked separates senior from mid, while writing a query that runs but stalling on why ROW_NUMBER over RANK lands the other way. With one week before a loop, the five highest-leverage patterns to drill are dedup-latest, conditional aggregation, gaps-and-islands, sessionization, and top-N per group. A four-week ramp that mirrors what passing candidates actually do: week one is SELECT, WHERE, GROUP BY with HAVING, and INNER versus LEFT JOIN at 20 Easy problems timed 5 to 8 minutes each; week two is multi-table joins, conditional aggregation with CASE inside SUM, date functions, and the many-to-many duplication trap at 20 Medium; week three, the one most candidates underweight and regret, is window functions in full (ROW_NUMBER, RANK and DENSE_RANK, LAG, LEAD, running totals, the dedup pattern) plus CTEs and three recursive ones at 15 Medium-to-Hard; week four is timed drills spoken out loud, adding gaps-and-islands and three pivots, mixed with mock interviews.

How many SQL interview questions does this catalog have?
922 SQL questions tagged across 8 patterns and 76 named employers. JOINs are the largest topic at 168 questions, aggregation is second at 142, window functions third at 124, string and NULL handling 94, subqueries 88, date and time 78, CTEs 96 (overlap with other topics), and gap-and-island 64.
Do these SQL interview questions run against a real database?
Yes. Every submission goes to a live Postgres 16 process. The grader replays your query against 10 randomized seeds and returns pass-or-fail per seed, with row counts and the specific diff for the failures. A passing query plus EXPLAIN ANALYZE comes back in about 1.4 seconds median.
What SQL dialect should a data engineer practice for a Snowflake or BigQuery role?
Postgres 16 here. About 85 percent of patterns port directly to Snowflake, BigQuery, Redshift, and MySQL 8. The Postgres-specific syntax (DISTINCT ON, FILTER WHERE, lateral joins) is tagged on each problem so you can skip or substitute. Window functions, CTEs, and aggregation behave identically across dialects.
What makes multi-seed grading different from single-fixture SQL graders?
A single-fixture grader rewards a query that produces the expected rows by coincidence. The most common form is hard-coding an identifier the candidate saw in the schema panel. The second is LIMIT 1 without an ORDER BY tiebreaker. Replaying against 10 seeds with engineered ties, NULL distributions, and cardinality skew surfaces those bugs the way a code reviewer would.
Can I see the query plan after a passing SQL submission?
Yes. EXPLAIN ANALYZE is available on every problem after first pass, showing the Postgres plan with row estimates, costs, and the chosen join algorithm. The reference solution's plan is viewable side by side. Useful for optimization rounds that show up at L5 and above in data engineer loops.
How many SQL questions should a data engineer solve before a phone screen?
50 to 80 problems across Easy and Medium, distributed across JOINs, aggregation, and window functions. The bottleneck is rarely volume. It is pattern recognition. Once a data engineer can identify top-N per group, dedup, gap-and-island, sessionization, and SCD joins on sight, the catalog gets dramatically shorter.
Are these SQL interview questions sourced from real interviews?
Yes. Questions come from interview reports submitted to the platform by data engineer candidates, then deduplicated and rewritten so the schema and data shapes match what surfaced in real rounds without copying specific prompt text. Company tags are applied only when at least one report explicitly cited the company in the same question shape.
How does this compare to LeetCode SQL?
Three differences. LeetCode grades against 1 fixed dataset versus 10 randomized seeds. LeetCode skews to puzzle problems; this catalog skews to pipeline patterns from data engineer interview write-ups. LeetCode's Medium and Hard tiers and most company-tagged questions are behind Premium; the full 922-problem catalog here is free.
Which SQL topics show up most in data engineer interviews?
Aggregation is the largest single bucket, joins next, window functions third. The four sub-areas to drill hardest: window functions, CTEs, gaps-and-islands, and aggregation with HAVING. A typical 45-minute round mixes two or three of these in one problem rather than asking them in isolation.
What is the difference between practicing SQL problems and practicing the SQL interview?
Problems mode is the gym: self-paced, a clear prompt, instant grader feedback. Interview mode is the game: a vague prompt, a timer, follow-ups, and a verdict. Both matter. Most candidates over-invest in problem mode and walk into their first onsite having never defended a query out loud, then lose points on the communication row of the rubric rather than on correctness.

922 practice problems matching this filter. Difficulty: medium (431), hard (146), easy (345).

SQL (922)