SQL Interview Questions for Data Engineers

The complete SQL interview question hub for data engineer roles. 200+ questions indexed by topic (joins, window functions, CTEs, recursive queries, optimization), difficulty (L3 to L6), and company (FAANG, Stripe, Airbnb, Databricks, Snowflake, and more). Each question links to the deeper round guide for context and the dialect-specific reference for the syntax. SQL shows up in most data engineer interview loops, which makes this the most-used hub on the site. Pair with the the full data engineer interview playbook.

SQL Concepts That Come Up in Data Engineer Interviews

Patterns ranked roughly by how often they appear in SQL rounds.

Pattern	Frequency	Common In
GROUP BY with HAVING	Very common	Every loop, all levels
INNER and LEFT JOIN	Very common	Every loop
Window functions (PARTITION BY)	Common	L4+, FAANG
ROW_NUMBER deduplication	Common	Every loop
RANK and DENSE_RANK	Common	L4+
Self join on inequality	Common	L4+, FAANG
Gap and island problems	Common	Senior, FAANG
Rolling and moving averages	Common	Analytics-heavy roles
Recursive CTE	Occasional	Senior, modeling-heavy roles
NULLIF and COALESCE patterns	Common	Every loop
DATE_TRUNC and date math	Common	Every loop
EXISTS vs IN	Occasional	Optimization rounds
Pivot with conditional aggregation	Common	Every loop
EXPLAIN plan reading	Occasional	Senior+, optimization rounds
UNION vs UNION ALL	Occasional	Every loop
Anti-join with NOT EXISTS	Occasional	L4+
Lateral / CROSS APPLY	Occasional	L5+, dialect-specific
MERGE / UPSERT	Occasional	L5+, modeling
Approximate count distinct	Occasional	L5+, scale-aware
JSON functions	Common	Every modern loop

SQL Questions by Topic

200+ questions organized by SQL concept. Each topic includes 8-15 questions ranging from L3 fundamentals to L6 optimization. Click into any topic for the full question set with worked answers.

Joins

INNER, LEFT, RIGHT, FULL OUTER, self, cross, anti, semi

Foundation of every SQL round. Topics covered: join type selection, NULL handling in outer joins, self-join with inequality for pair generation, cross-join for cartesian products, anti-join for “rows in A not in B”, semi-join for “rows in A that match B”, multi-table join optimization. 18 questions in this topic, ranging L3 fundamentals to L5 optimization.

GROUP BY

GROUP BY, HAVING, ROLLUP, CUBE, GROUPING SETS

Every loop tests aggregation depth. Topics covered: GROUP BY single and multiple columns, HAVING vs WHERE distinction, conditional aggregation with CASE, ROLLUP for subtotal rows, CUBE for all-dimension combinations, GROUPING SETS for custom subtotal groups, NULL behavior in aggregates, COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col). 15 questions L3 to L5.

Window Functions

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, frame clauses

The L4-to-L5 differentiator. Topics covered: ranking functions and tie behavior (DENSE_RANK vs RANK vs ROW_NUMBER), LAG and LEAD for previous and next row access, FIRST_VALUE and LAST_VALUE with IGNORE NULLS, ROWS vs RANGE frame clauses, sliding window aggregations, partial-window edge cases. 22 questions L4 to L6.

CTEs

Common Table Expressions and recursive CTEs

Modern SQL standard. Topics covered: CTE for readability vs subquery, multiple CTE chains, recursive CTE for hierarchical data (org charts, category trees), recursive CTE termination conditions to prevent infinite loops, CTE materialization hints in Postgres. 12 questions L4 to L6.

Gap-and-Island

Consecutive sequence detection patterns

Classic L5 pattern. Topics covered: consecutive-day active users, consecutive runs of a value, change-point detection in time series, sessionization with inactivity gap, the ROW_NUMBER subtraction trick that makes most gap-and-island problems tractable. 10 questions L4 to L5.

Deduplication

ROW_NUMBER + filter, DISTINCT, GROUP BY trade-offs

Every data engineer’s daily problem. Topics covered: keep latest per key, keep oldest per key, composite key dedup, dedup with tie-breaking on secondary column, why ROW_NUMBER beats DISTINCT for retaining all columns, dedup at write time vs read time. 9 questions L3 to L5.

Top-N per Group

Top-K per category with tie handling

Universal interview pattern. Topics covered: top 3 products per category, top 5 customers per region, top 1 with tie-breaking, top-K with ties using DENSE_RANK, top-K via lateral join, top-K via correlated subquery, performance comparison of approaches. 8 questions L4 to L5.

Funnels and Cohorts

Multi-step conversion analysis and cohort retention

Product analytics core pattern. Topics covered: build a conversion funnel with conditional aggregation, cohort retention table with month-over-month columns, triangular cohort visualization, time-to-event analysis, A/B test exposure-and-outcome joins. 11 questions L4 to L5.

Pivot

Conditional aggregation, PIVOT operator, dynamic pivot

Transform rows to columns for reporting. Topics covered: SUM(CASE WHEN) pattern (portable), PIVOT operator (Snowflake, MSSQL, Oracle), unpivot for reverse direction, dynamic pivot with SQL string generation, JSON_OBJECT_AGG for pivot-like JSON output. 7 questions L4 to L5.

Date and Time

DATE_TRUNC, EXTRACT, INTERVAL, time zones

Constant source of bugs. Topics covered: DATE_TRUNC for grain alignment, EXTRACT for component parts, INTERVAL arithmetic, time zone conversion (AT TIME ZONE), date generation with generate_series, business-day calculation excluding weekends and holidays. 9 questions L3 to L5.

Optimization

EXPLAIN plans, predicate pushdown, statistics

Senior signal. Topics covered: read EXPLAIN plans across dialects (Postgres, Snowflake, BigQuery, Redshift), identify full table scans vs index seeks, predicate pushdown principles, why functions in WHERE block index use, partition pruning, broadcast vs shuffle joins, statistics maintenance (ANALYZE). 10 questions L5 to L6.

Modern SQL

JSON, arrays, semi-structured data

Core 2024+ requirement. Topics covered: JSON extraction across dialects (Postgres -> / ->>, BigQuery JSON_EXTRACT, Snowflake colon syntax), ARRAY_AGG and UNNEST for one-to-many flattening, shredding strategies for analytics on JSON-heavy tables, performance trade-offs of in-row JSON vs shredded columns. 11 questions L4 to L5.

SQL Questions by Difficulty

200+ questions tagged by interview level. The L3 set drills fundamentals; L4 adds depth and edge case handling; L5 adds optimization and dialect-specific patterns; L6 adds large-scale architecture-level SQL.

L3 (Junior)

70 questions on fundamentals

Joins, GROUP BY, basic window functions, simple CTEs, date functions, conditional aggregation. Goal: medium under 15 minutes. Drills the floor of what every data engineer should know reflexively.

L4 (Mid-level)

75 questions on depth

Window functions in detail, recursive CTEs, gap-and-island, sessionization, top-N per group, pivot patterns, JSON manipulation. Goal: medium under 12 minutes, hard under 25.

L5 (Senior)

45 questions on optimization and judgment

EXPLAIN plan reading, query optimization, hot-key handling, large-scale aggregation patterns, approximate algorithms (HLL, percentile estimates), dialect-specific patterns. The L5 signal is volunteering trade-offs unprompted.

L6 (Staff+)

10 questions on architecture-level SQL

Materialized views vs incremental tables, time-travel queries (Snowflake, Iceberg, Delta), multi-tenant query optimization, multi-region warehouse design, cost-aware query design at petabyte scale.

SQL Questions by Company

Question patterns recur within companies. Below are the SQL question themes most common at each major employer.

FAANG

Window functions, recursive CTEs, optimization

See Meta, Amazon, Apple, Netflix, Google Data Engineer questions for the full FAANG-tagged question set. Common patterns: graph traversal in SQL (Meta), DENSE_RANK with tie handling (Amazon), BigQuery ARRAY_AGG and UNNEST (Google), metadata pipelines (Apple), sessionization (Netflix).

Stripe

Financial-precision SQL and reconciliation

See Stripe Data Engineer interview process and questions for the full Stripe interview guide. Common SQL patterns: account balance reconstruction with running totals, reconciliation FULL OUTER JOIN with mismatch classification, refund attribution across day boundaries, currency conversion at trade-time.

Airbnb

Two-sided marketplace SQL and experimentation

See Airbnb Data Engineer interview process and questions for the full Airbnb interview guide. Common SQL patterns: host-guest co-occurrence, pricing elasticity analysis, A/B experiment exposure tables, retention cohorts with LTV.

Databricks

Spark SQL, Delta Lake patterns, lakehouse SQL

See Databricks Data Engineer interview process and questions for the full Databricks interview guide. Common SQL patterns: MERGE for SCD, time travel queries, Z-order clustering, photon engine optimization, Delta change data feed.

Snowflake

QUALIFY, micro-partition awareness, clustering

See Snowflake Data Engineer interview process and questions for the full Snowflake interview guide. Common SQL patterns: QUALIFY for window-function filtering, CLUSTER BY for clustering keys, time travel queries, zero-copy cloning, search optimization service.

Lyft / DoorDash / Uber

Marketplace SQL and geospatial aggregation

See Lyft Data Engineer interview process and questions, DoorDash Data Engineer interview process and questions, and Uber Data Engineer interview process and questions. Common SQL patterns: rolling-window aggregations per H3 hex cell, supply-demand ratio queries, dasher / driver acceptance rate per market, surge multiplier attribution.

Dialect-Specific Reference Pages

SQL patterns transfer; syntax doesn't. Below are the dialect-specific deep dives for the warehouses most-tested in 2026.

BigQuery interview questions

Slot-based pricing, partitioning, clustering, BI Engine, ARRAY_AGG and UNNEST, JSON functions. See BigQuery interview problems for Data Engineer interviews for the full guide.

Redshift interview questions

Sort keys, dist keys, RA3 architecture, Spectrum, materialized views, performance tuning. See Redshift interview problems for Data Engineer interviews for the full guide.

Postgres interview questions

MVCC, indexing strategies, partitioning, replication, EXPLAIN ANALYZE, JSON / JSONB, LATERAL joins. See Postgres interview problems for Data Engineer interviews for the full guide.

Snowflake (covered in company guide)

Micro-partitions, clustering keys, time travel, zero-copy cloning, QUALIFY, search optimization. See Snowflake Data Engineer interview process and questions for the full guide.

How the SQL Hub Connects to the Rest of the Cluster

The SQL hub is the deepest tech reference in the cluster because SQL is the most-tested skill. The SQL interview round walkthrough page covers the round-level framework (4-week prep plan, what interviewers grade for, the rhythm of a live round). This hub is the live coding problem set you drill against; the round guide is the framework you apply.

For format-specific question collections, see top 50 data engineer interview questions (top 50 across all domains, SQL is 20 of them) and full top 100 Data Engineer interview questions list (top 100, SQL is 40 of them). For company-specific SQL patterns, every company guide in the cluster includes the recurring SQL flavors at that loop.

Prepare for the interview

01 / Open invite

02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.

The diff against expected. Where ties broke. What you missed.

sandbox

1SELECT user_id,

2 COUNT(*) AS sessions

3FROM events

4WHERE ts >= NOW() - INTERVAL '7 day'

Execute your solution0.4s avg.

MicrosoftInterview question

Solve a problem

Data engineer interview prep FAQ

How many SQL questions should I drill before a data engineer interview?+

100 to 200 problems for L3 to L4. 200+ problems plus optimization-round prep for L5+. The 200-question hub covers the highest-leverage subset; volume past that has diminishing returns.

Should I learn SQL syntax for a specific dialect?+

Match the company. PostgreSQL syntax is the most portable starting point and the most common interview default. Add the company-specific dialect (BigQuery, Snowflake, Redshift) once you know the loop’s stack.

How fast should I be at SQL for interviews?+

L3 medium under 15 minutes; L4 medium under 12 minutes; L5 medium under 10 minutes. Hard problems take roughly 2x medium. Speed is built by volume practice, not by reading solutions.

Do I need to know NoSQL or graph queries?+

No, almost never tested in data engineer interviews. The exception is graph-heavy companies (Pinterest, Twitter) where graph traversal in SQL or in a graph database may come up. For most loops, focus 100% on SQL.

How important is query optimization?+

L3 to L4: optional. L5: required, especially EXPLAIN plan reading. L6: deeply required. The optimization round at FAANG and other large companies tests this directly. See the SQL round guide for the framework.

What’s the difference between this hub and the 50-question / 100-question pages?+

The format pages (50, 100, FAANG, PDF) are curated lists optimized for time-pressed prep. This hub is the full topic-indexed reference for systematic study. Use the format pages for triage, the hub for depth.

Are LeetCode SQL questions appropriate for data engineer prep?+

Partially. LeetCode SQL skews toward algorithmic puzzles and tricky edge cases. Real DE interviews skew toward business-question-as-SQL with an emphasis on grain awareness, edge case handling, and dialect fluency. LeetCode is fine for fluency drilling but is not a substitute for problem patterns in this hub.

How often are the SQL problems updated?+

Roughly quarterly. New questions are added based on recent interview reports; questions that became too widely known to differentiate are retired or rewritten.

02 / Why practice

Drill SQL Problems in the Browser

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Start with a problem

More data engineer interview prep guides

BigQuery interview problems for Data Engineer interviews→

BigQuery internals, slot-based pricing, partitioning, and clustering interview prep.

Redshift interview problems for Data Engineer interviews→

Redshift sort keys, dist keys, compression, and RA3 architecture interview prep.

Postgres interview problems for Data Engineer interviews→

Postgres MVCC, indexing, partitioning, and replication interview prep.

Flink interview problems for streaming Data Engineer interviews→

Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.

MapReduce, HDFS, YARN interview prep→

Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.

AWS Glue Data Catalog interview prep→

AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.