Tech-Specific Question Hub

SQL Interview Questions for Data Engineers

The complete SQL interview question hub for data engineer roles. 200+ questions indexed by topic (joins, window functions, CTEs, recursive queries, optimization), difficulty (L3 to L6), and company (FAANG, Stripe, Airbnb, Databricks, Snowflake, and more). Each question links to the deeper round guide for context and the dialect-specific reference for the syntax. SQL appears in 95% of data engineer interview loops in our dataset of 1,042 reports collected 2024 to 2026, making this the most-used hub on the site. Pair with the the full data engineer interview playbook.

The Short Answer
SQL is the single most-tested skill in data engineering interviews. Drilling questions from this hub builds the fluency that passes the SQL round at every level. The 200 questions cover: joins (every kind), GROUP BY with HAVING, window functions (ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE, frame clauses), CTEs (recursive and non-recursive), gap-and-island patterns, deduplication, sessionization, top-N per group, pivot with conditional aggregation, query optimization, and dialect-specific features (Snowflake QUALIFY, BigQuery ARRAY_AGG, Postgres LATERAL). For round framework, see SQL interview round walkthrough.
Updated April 2026·By The DataDriven Team

SQL Concept Frequency in Data Engineer Interviews

Pattern frequency from 1,042 verified interview reports collected on DataDriven from 2024 to 2026.

PatternShare of SQL QuestionsCommon In
GROUP BY with HAVING15.3%Every loop, all levels
INNER and LEFT JOIN21.1%Every loop
Window functions (PARTITION BY)9.7%L4+, FAANG
ROW_NUMBER deduplication6.2%Every loop
RANK and DENSE_RANK4.9%L4+
Self join on inequality4.1%L4+, FAANG
Gap and island problems3.8%Senior, FAANG
Rolling and moving averages3.6%Analytics-heavy roles
Recursive CTE2.7%Senior, modeling-heavy roles
NULLIF and COALESCE patterns5.1%Every loop
DATE_TRUNC and date math7.8%Every loop
EXISTS vs IN2.4%Optimization rounds
Pivot with conditional aggregation3.2%Every loop
EXPLAIN plan reading2.1%Senior+, optimization rounds
UNION vs UNION ALL1.8%Every loop
Anti-join with NOT EXISTS1.6%L4+
Lateral / CROSS APPLY1.4%L5+, dialect-specific
MERGE / UPSERT1.7%L5+, modeling
Approximate count distinct1.2%L5+, scale-aware
JSON functions2.9%Every loop in 2024+

SQL Questions by Topic

200+ questions organized by SQL concept. Each topic includes 8-15 questions ranging from L3 fundamentals to L6 optimization. Click into any topic for the full question set with worked answers.

Joins

INNER, LEFT, RIGHT, FULL OUTER, self, cross, anti, semi

Foundation of every SQL round. Topics covered: join type selection, NULL handling in outer joins, self- join with inequality for pair generation, cross-join for cartesian products, anti-join for “rows in A not in B”, semi-join for “rows in A that match B”, multi-table join optimization. 18 questions in this topic, ranging L3 fundamentals to L5 optimization.
GROUP BY

GROUP BY, HAVING, ROLLUP, CUBE, GROUPING SETS

Every loop tests aggregation depth. Topics covered: GROUP BY single and multiple columns, HAVING vs WHERE distinction, conditional aggregation with CASE, ROLLUP for subtotal rows, CUBE for all-dimension combinations, GROUPING SETS for custom subtotal groups, NULL behavior in aggregates, COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col). 15 questions L3 to L5.
Window Functions

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, frame clauses

The L4-to-L5 differentiator. Topics covered: ranking functions and tie behavior (DENSE_RANK vs RANK vs ROW_NUMBER), LAG and LEAD for previous and next row access, FIRST_VALUE and LAST_VALUE with IGNORE NULLS, ROWS vs RANGE frame clauses, sliding window aggregations, partial-window edge cases. 22 questions L4 to L6.
CTEs

Common Table Expressions and recursive CTEs

Modern SQL standard. Topics covered: CTE for readability vs subquery, multiple CTE chains, recursive CTE for hierarchical data (org charts, category trees), recursive CTE termination conditions to prevent infinite loops, CTE materialization hints in Postgres. 12 questions L4 to L6.
Gap-and-Island

Consecutive sequence detection patterns

Classic L5 pattern. Topics covered: consecutive-day active users, consecutive runs of a value, change- point detection in time series, sessionization with inactivity gap, the ROW_NUMBER subtraction trick that makes most gap-and-island problems tractable. 10 questions L4 to L5.
Deduplication

ROW_NUMBER + filter, DISTINCT, GROUP BY trade-offs

Every data engineer's daily problem. Topics covered: keep latest per key, keep oldest per key, composite key dedup, dedup with tie-breaking on secondary column, why ROW_NUMBER beats DISTINCT for retaining all columns, dedup at write time vs read time. 9 questions L3 to L5.
Top-N per Group

Top-K per category with tie handling

Universal interview pattern. Topics covered: top 3 products per category, top 5 customers per region, top 1 with tie-breaking, top-K with ties using DENSE_RANK, top-K via lateral join, top-K via correlated subquery, performance comparison of approaches. 8 questions L4 to L5.
Funnels and Cohorts

Multi-step conversion analysis and cohort retention

Product analytics core pattern. Topics covered: build a conversion funnel with conditional aggregation, cohort retention table with month-over-month columns, triangular cohort visualization, time-to-event analysis, A/B test exposure-and-outcome joins. 11 questions L4 to L5.
Pivot

Conditional aggregation, PIVOT operator, dynamic pivot

Transform rows to columns for reporting. Topics covered: SUM(CASE WHEN) pattern (portable), PIVOT operator (Snowflake, MSSQL, Oracle), unpivot for reverse direction, dynamic pivot with SQL string generation, JSON_OBJECT_AGG for pivot-like JSON output. 7 questions L4 to L5.
Date and Time

DATE_TRUNC, EXTRACT, INTERVAL, time zones

Constant source of bugs. Topics covered: DATE_TRUNC for grain alignment, EXTRACT for component parts, INTERVAL arithmetic, time zone conversion (AT TIME ZONE), date generation with generate_series, business- day calculation excluding weekends and holidays. 9 questions L3 to L5.
Optimization

EXPLAIN plans, predicate pushdown, statistics

Senior signal. Topics covered: read EXPLAIN plans across dialects (Postgres, Snowflake, BigQuery, Redshift), identify full table scans vs index seeks, predicate pushdown principles, why functions in WHERE block index use, partition pruning, broadcast vs shuffle joins, statistics maintenance (ANALYZE). 10 questions L5 to L6.
Modern SQL

JSON, arrays, semi-structured data

Core 2024+ requirement. Topics covered: JSON extraction across dialects (Postgres -> / ->>, BigQuery JSON_EXTRACT, Snowflake colon syntax), ARRAY_AGG and UNNEST for one-to-many flattening, shredding strategies for analytics on JSON-heavy tables, performance trade-offs of in-row JSON vs shredded columns. 11 questions L4 to L5.

SQL Questions by Difficulty

200+ questions tagged by interview level. The L3 set drills fundamentals; L4 adds depth and edge case handling; L5 adds optimization and dialect-specific patterns; L6 adds large-scale architecture-level SQL.

L3 (Junior)

70 questions on fundamentals

Joins, GROUP BY, basic window functions, simple CTEs, date functions, conditional aggregation. Goal: medium under 15 minutes. Drills the floor of what every data engineer should know reflexively.
L4 (Mid-level)

75 questions on depth

Window functions in detail, recursive CTEs, gap- and-island, sessionization, top-N per group, pivot patterns, JSON manipulation. Goal: medium under 12 minutes, hard under 25.
L5 (Senior)

45 questions on optimization and judgment

EXPLAIN plan reading, query optimization, hot-key handling, large-scale aggregation patterns, approximate algorithms (HLL, percentile estimates), dialect-specific patterns. The L5 signal is volunteer- ing trade-offs unprompted.
L6 (Staff+)

10 questions on architecture-level SQL

Materialized views vs incremental tables, time-travel queries (Snowflake, Iceberg, Delta), multi-tenant query optimization, multi-region warehouse design, cost-aware query design at petabyte scale.

SQL Questions by Company

Question patterns recur within companies. Below are the SQL question themes most common at each major employer.

FAANG

Window functions, recursive CTEs, optimization

See Meta, Amazon, Apple, Netflix, Google Data Engineer questions for the full FAANG-tagged question set. Common patterns: graph traversal in SQL (Meta), DENSE_RANK with tie handling (Amazon), BigQuery ARRAY_AGG and UNNEST (Google), metadata pipelines (Apple), sessionization (Netflix).
Stripe

Financial-precision SQL and reconciliation

See Stripe Data Engineer interview process and questions for the full Stripe interview guide. Common SQL patterns: account balance reconstruction with running totals, reconciliation FULL OUTER JOIN with mismatch classification, refund attribution across day boundaries, currency conversion at trade-time.
Airbnb

Two-sided marketplace SQL and experimentation

See Airbnb Data Engineer interview process and questions for the full Airbnb interview guide. Common SQL patterns: host-guest co-occurrence, pricing elasticity analysis, A/B experiment exposure tables, retention cohorts with LTV.
Databricks

Spark SQL, Delta Lake patterns, lakehouse SQL

See Databricks Data Engineer interview process and questions for the full Databricks interview guide. Common SQL patterns: MERGE for SCD, time travel queries, Z-order clustering, photon engine optimization, Delta change data feed.
Snowflake

QUALIFY, micro-partition awareness, clustering

See Snowflake Data Engineer interview process and questions for the full Snowflake interview guide. Common SQL patterns: QUALIFY for window-function filtering, CLUSTER BY for clustering keys, time travel queries, zero-copy cloning, search optimization service.
Lyft / DoorDash / Uber

Marketplace SQL and geospatial aggregation

See Lyft Data Engineer interview process and questions, DoorDash Data Engineer interview process and questions, and Uber Data Engineer interview process and questions. Common SQL patterns: rolling-window aggregations per H3 hex cell, supply-demand ratio queries, dasher / driver acceptance rate per market, surge multiplier attribution.

Dialect-Specific Reference Pages

SQL patterns transfer; syntax doesn't. Below are the dialect-specific deep dives for the warehouses most-tested in 2026.

BigQuery interview questions

Slot-based pricing, partitioning, clustering, BI Engine, ARRAY_AGG and UNNEST, JSON functions. See BigQuery question bank for Data Engineer interviews for the full guide.

Redshift interview questions

Sort keys, dist keys, RA3 architecture, Spectrum, materialized views, performance tuning. See Redshift question bank for Data Engineer interviews for the full guide.

Postgres interview questions

MVCC, indexing strategies, partitioning, replication, EXPLAIN ANALYZE, JSON / JSONB, LATERAL joins. See Postgres question bank for Data Engineer interviews for the full guide.

Snowflake (covered in company guide)

Micro-partitions, clustering keys, time travel, zero-copy cloning, QUALIFY, search optimization. See Snowflake Data Engineer interview process and questions for the full guide.

How the SQL Hub Connects to the Rest of the Cluster

The SQL hub is the deepest tech reference in the cluster because SQL is the most-tested skill. The SQL interview round walkthrough page covers the round-level framework (4-week prep plan, what interviewers grade for, the rhythm of a live round). This hub is the question bank you drill against; the round guide is the framework you apply.

For format-specific question collections, see top 50 data engineer interview questions (top 50 across all domains, SQL is 20 of them) and full top 100 Data Engineer interview questions list (top 100, SQL is 40 of them). For company-specific SQL patterns, every company guide in the cluster includes the recurring SQL flavors at that loop.

Data Engineer Interview Prep FAQ

How many SQL questions should I drill before a data engineer interview?+
100 to 200 problems for L3 to L4. 200+ problems plus optimization-round prep for L5+. The 200-question hub covers the highest-leverage subset; volume past that has diminishing returns.
Should I learn SQL syntax for a specific dialect?+
Match the company. PostgreSQL syntax is the most portable starting point and the most common interview default. Add the company-specific dialect (BigQuery, Snowflake, Redshift) once you know the loop's stack.
How fast should I be at SQL for interviews?+
L3 medium under 15 minutes; L4 medium under 12 minutes; L5 medium under 10 minutes. Hard problems take roughly 2x medium. Speed is built by volume practice, not by reading solutions.
Do I need to know NoSQL or graph queries?+
No, almost never tested in data engineer interviews. The exception is graph-heavy companies (Pinterest, Twitter) where graph traversal in SQL or in a graph database may come up. For most loops, focus 100% on SQL.
How important is query optimization?+
L3 to L4: optional. L5: required, especially EXPLAIN plan reading. L6: deeply required. The optimization round at FAANG and other large companies tests this directly. See the SQL round guide for the framework.
What's the difference between this hub and the 50-question / 100-question pages?+
The format pages (50, 100, FAANG, PDF) are curated lists optimized for time-pressed prep. This hub is the full topic-indexed reference for systematic study. Use the format pages for triage, the hub for depth.
Are LeetCode SQL questions appropriate for data engineer prep?+
Partially. LeetCode SQL skews toward algorithmic puzzles and tricky edge cases. Real DE interviews skew toward business-question-as-SQL with an emphasis on grain awareness, edge case handling, and dialect fluency. LeetCode is fine for fluency drilling but is not a substitute for problem patterns in this hub.
How often is the question bank updated?+
Quarterly. New questions are added based on recent interview reports; questions that became too widely-known to differentiate are retired or rewritten. The current version is dated April 2026.

Drill the SQL Bank in the Browser

Run real SQL interview problems against real schemas in our practice sandbox. Get instant feedback. Build the muscle memory that wins the round.

Start SQL Practice

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats