# SQL Interview Questions

> Live-executed SQL interview questions for data engineer and data scientist roles, with 10-seed grading and a Postgres editor.

Canonical URL: <https://datadriven.io/sql-interview-questions>

Breadcrumb: [Home](https://datadriven.io/) > [SQL Interview Questions](https://datadriven.io/sql-interview-questions)

## Summary

927 SQL interview questions pulled from data engineer and data scientist interview reports. Every query submits to a live Postgres 16 process server-side. The grader replays your query against 10 randomized seeds, so a query that hard-codes IDs or omits an ORDER BY tiebreaker fails the same way a code reviewer would catch it.

## What this page covers

SQL appears in 95 percent of data engineer interview loops, 85 percent of analytics engineer loops, and 80 percent of data scientist loops. The shape is consistent across companies. A small schema with 2 to 5 tables. A business question phrased ambiguously, the kind a product manager would ask in Slack. Fifteen to thirty minutes to write a working query. A follow-up about NULL handling, ties, or window-frame semantics. The 927-question catalog on this page is organized around the patterns that surface in those rounds.

Eight patterns cover 90 percent of data engineer interview SQL. Top-N per group with ROW_NUMBER, RANK, or DENSE_RANK. Dedup-latest by natural key with a composite ORDER BY tiebreaker. Sessionization with LAG and SUM OVER for boolean accumulator construction. Gap-and-island via date subtraction from ROW_NUMBER. Slowly changing dimension joins with half-open intervals where effective_from is less-than-or-equal to event_time and effective_to is greater-than event_time. Conditional aggregation with FILTER (WHERE) on engines that support it or SUM(CASE WHEN) for portability. Running and rolling windows with explicit ROWS BETWEEN frame clauses. Late-arriving reconciliation with MERGE on a composite natural key like (event_id, source).

Multi-seed grading is the part of this catalog that separates it from LeetCode SQL, HackerRank SQL, and sql-practice.com. Each seed engineers ties at the top of a ranking, NULL distributions across nullable columns, and cardinality skew on join keys. A query that selects the customer with the highest lifetime spend using ORDER BY SUM(amount) DESC LIMIT 1 passes one fixture and fails any seed where two customers tie. The submission log returns 7 out of 10 passing, names the missing tiebreaker on the failing seeds, and the data engineer rewrites with ROW_NUMBER plus a composite ORDER BY.

Postgres 16 is the practice dialect. About 85 percent of the catalog is portable across Snowflake, BigQuery, Redshift, and MySQL 8. The Postgres-specific syntax (DISTINCT ON, FILTER WHERE, RANGE BETWEEN INTERVAL, lateral joins) is tagged on each problem. The Snowflake-and-BigQuery-native QUALIFY clause and the Hive/Presto UNNEST patterns also appear as tagged variants. Companies represented through tagged questions include Meta, Amazon, Google, Netflix, Stripe, Airbnb, Uber, DoorDash, Snowflake, Databricks, Robinhood, Block, Spotify, and Coinbase. EXPLAIN ANALYZE is available on every passing submission for the optimization round that data engineer candidates see at L5 and above.

Twenty SQL interview questions, ranked by interview frequency and the seniority at which each lands, with the approach, the typical wrong answer, and the follow-up the interviewer asks next. L3 warm-ups: combine two tables and decide INNER versus LEFT JOIN when the right side can be missing; GROUP BY with HAVING to filter aggregates rather than rows. L4 core, the bulk of every loop: deduplicate to the latest row per natural key with ROW_NUMBER OVER PARTITION BY key ORDER BY updated_at DESC plus a composite tiebreaker (the wrong answer uses MAX(updated_at) in a self-join and double-counts ties); chain CTEs to stage a multi-step transformation; conditional aggregation with SUM(CASE WHEN ...) where the trap is AVG of a CASE silently dropping NULLs from the denominator; gaps-and-islands for consecutive-active-day streaks via date minus ROW_NUMBER; top-N per group where DENSE_RANK keeps ties and ROW_NUMBER forces exactly one. L5 senior, defended out loud: LAG and LEAD for transaction-over-transaction deltas and sessionization with a 30-minute inactivity gap; retention cohorts that join a signup cohort to later activity on a half-open date window; NULL traps where NOT IN against a nullable subquery silently returns empty and NOT EXISTS is the safe rewrite; late-arriving and out-of-order event handling with event_time versus ingest_time. L6 staff, not always asked but rewarding: read an EXPLAIN plan and name whether the full table scan is a missing index, a function call in WHERE blocking pushdown, or a poor join order; surface silent data-quality regressions across a fact table with a two-week rolling baseline on row count, per-column null rate, and key cardinality, flagging drift beyond three standard deviations.

How the round actually flows. A 45-minute SQL round opens with one Easy or Medium warm-up to get the candidate typing, then moves into one Medium-to-Hard problem where the follow-ups push on edge cases, dialect choices, and runtime cost. The follow-ups carry as much weight as the initial query: volunteering that NULLIF prevents the first-row divide-by-zero before being asked separates senior from mid, while writing a query that runs but stalling on why ROW_NUMBER over RANK lands the other way. With one week before a loop, the five highest-leverage patterns to drill are dedup-latest, conditional aggregation, gaps-and-islands, sessionization, and top-N per group. A four-week ramp that mirrors what passing candidates actually do: week one is SELECT, WHERE, GROUP BY with HAVING, and INNER versus LEFT JOIN at 20 Easy problems timed 5 to 8 minutes each; week two is multi-table joins, conditional aggregation with CASE inside SUM, date functions, and the many-to-many duplication trap at 20 Medium; week three, the one most candidates underweight and regret, is window functions in full (ROW_NUMBER, RANK and DENSE_RANK, LAG, LEAD, running totals, the dedup pattern) plus CTEs and three recursive ones at 15 Medium-to-Hard; week four is timed drills spoken out loud, adding gaps-and-islands and three pivots, mixed with mock interviews.

## Frequently asked questions

### How many SQL interview questions does this catalog have?

927 SQL questions tagged across 8 patterns and 76 named employers. JOINs are the largest topic at 168 questions, aggregation is second at 142, window functions third at 124, string and NULL handling 94, subqueries 88, date and time 78, CTEs 96 (overlap with other topics), and gap-and-island 64.

### Do these SQL interview questions run against a real database?

Yes. Every submission goes to a live Postgres 16 process. The grader replays your query against 10 randomized seeds and returns pass-or-fail per seed, with row counts and the specific diff for the failures. A passing query plus EXPLAIN ANALYZE comes back in about 1.4 seconds median.

### What SQL dialect should a data engineer practice for a Snowflake or BigQuery role?

Postgres 16 here. About 85 percent of patterns port directly to Snowflake, BigQuery, Redshift, and MySQL 8. The Postgres-specific syntax (DISTINCT ON, FILTER WHERE, lateral joins) is tagged on each problem so you can skip or substitute. Window functions, CTEs, and aggregation behave identically across dialects.

### What makes multi-seed grading different from single-fixture SQL graders?

A single-fixture grader rewards a query that produces the expected rows by coincidence. The most common form is hard-coding an identifier the candidate saw in the schema panel. The second is LIMIT 1 without an ORDER BY tiebreaker. Replaying against 10 seeds with engineered ties, NULL distributions, and cardinality skew surfaces those bugs the way a code reviewer would.

### Can I see the query plan after a passing SQL submission?

Yes. EXPLAIN ANALYZE is available on every problem after first pass, showing the Postgres plan with row estimates, costs, and the chosen join algorithm. The reference solution's plan is viewable side by side. Useful for optimization rounds that show up at L5 and above in data engineer loops.

### How many SQL questions should a data engineer solve before a phone screen?

50 to 80 problems across Easy and Medium, distributed across JOINs, aggregation, and window functions. The bottleneck is rarely volume. It is pattern recognition. Once a data engineer can identify top-N per group, dedup, gap-and-island, sessionization, and SCD joins on sight, the catalog gets dramatically shorter.

### Are these SQL interview questions sourced from real interviews?

Yes. Questions come from interview reports submitted to the platform by data engineer candidates, then deduplicated and rewritten so the schema and data shapes match what surfaced in real rounds without copying specific prompt text. Company tags are applied only when at least one report explicitly cited the company in the same question shape.

### How does this compare to LeetCode SQL?

Three differences. LeetCode grades against 1 fixed dataset versus 10 randomized seeds. LeetCode skews to puzzle problems; this catalog skews to pipeline patterns from data engineer interview write-ups. LeetCode's Medium and Hard tiers and most company-tagged questions are behind Premium; the full 927-problem catalog here is free.

### Which SQL topics show up most in data engineer interviews?

Aggregation is the largest single bucket, joins next, window functions third. The four sub-areas to drill hardest: window functions, CTEs, gaps-and-islands, and aggregation with HAVING. A typical 45-minute round mixes two or three of these in one problem rather than asking them in isolation.

### What is the difference between practicing SQL problems and practicing the SQL interview?

Problems mode is the gym: self-paced, a clear prompt, instant grader feedback. Interview mode is the game: a vague prompt, a timer, follow-ups, and a verdict. Both matter. Most candidates over-invest in problem mode and walk into their first onsite having never defended a query out loud, then lose points on the communication row of the rubric rather than on correctness.

## How to prepare for a data engineer SQL interview

Eight patterns that cover 90 percent of SQL interview questions for data engineer roles, in the order most candidates work through them.

### Step 1: Drill top-N per group

ROW_NUMBER OVER PARTITION BY group ORDER BY metric DESC, filter rn less-than-or-equal-to N. The most-asked pattern. DENSE_RANK when keeping ties, RANK when skipping numbers, ROW_NUMBER for exactly one row per group.

### Step 2: Drill dedup-latest by natural key

ROW_NUMBER OVER PARTITION BY natural_key ORDER BY updated_at DESC, composite_tiebreaker. Filter rn equals 1. The composite tiebreaker after updated_at is what survives multi-seed replay.

### Step 3: Drill conditional aggregation pivot

SUM(CASE WHEN status equals active THEN 1 ELSE 0 END) for portable counts. COUNT(*) FILTER (WHERE status equals active) for Postgres and SQLite. AVG of CASE without ELSE silently returns 1.0 always because NULLs drop from the denominator.

### Step 4: Drill sessionization

LAG previous event timestamp. CASE WHEN gap exceeds 30 minutes THEN 1 ELSE 0 AS new_session. SUM new_session OVER PARTITION BY user ORDER BY ts as the session accumulator. GROUP BY (user, session_id) for per-session aggregates.

### Step 5: Drill gap-and-island

Generate a streak key with date minus ROW_NUMBER OVER PARTITION BY user ORDER BY date, times INTERVAL 1 day. GROUP BY (user, streak_key) HAVING COUNT greater-than-or-equal-to N.

### Step 6: Drill running and rolling windows

SUM OVER ORDER BY ts ROWS UNBOUNDED PRECEDING for running totals. AVG OVER ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW for 7-day rolling. Always specify ROWS, never default to RANGE.

### Step 7: Drill SCD half-open joins

Join fact at event_time to dim_scd2 ON effective_from less-than-or-equal-to event_time AND (effective_to IS NULL OR event_time less-than effective_to). The half-open interval prevents boundary doubling at the changeover microsecond.

### Step 8: Drill late-arriving reconciliation

MERGE on composite natural key (event_id, run_id) so backfills and on-call retries produce the same answer as the original run. The principle: re-running yesterday today must produce the same result.

## Related practice catalogs

- [SQL window function interview questions](https://datadriven.io/sql-window-functions-interview-questions): ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, frame clause traps with 10-seed grading.
- [SQL JOIN interview questions](https://datadriven.io/sql-joins-interview-questions): INNER vs LEFT, anti-join via NOT EXISTS, self-join inequality, many-to-many duplication trap.
- [SQL aggregation interview questions](https://datadriven.io/sql-aggregation-interview-questions): GROUP BY HAVING, COUNT vs COUNT DISTINCT, FILTER WHERE, AVG-of-CASE NULL bug.
- [Advanced SQL for senior data engineer roles](https://datadriven.io/advanced-sql-interview-questions): Recursive CTEs, gap-and-island, sessionization, EXPLAIN reading, skew handling.
- [SQL coding questions with live grading](https://datadriven.io/sql-coding-questions): Open-ended write-the-query problems graded against 10 randomized seeds.
- [SQL practice problems with multi-seed grader](https://datadriven.io/sql-practice-problems): Browser editor, pinned schema panel, EXPLAIN button, per-problem scratchpad.
- [Meta data engineer interview questions](https://datadriven.io/meta-data-engineer-interview-questions): Window-function-heavy SQL, gap-and-island for engagement streaks, Presto/Trino dialect.
- [Amazon data engineer interview questions](https://datadriven.io/amazon-data-engineer-interview-questions): Redshift-flavored SQL with DISTKEY and SORTKEY questions, Leadership Principles framing.
- [Full data engineer interview question catalog](https://datadriven.io/data-engineer-interview-questions): Every domain in one catalog: SQL, Python, data modeling, pipeline design, system design.

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.