The SQL Round
The SQL round shows up in roughly nine of every ten data engineering loops, and on most of those it shows up twice (phone screen and onsite). It's the gating round; if you don't pass it, the loop ends. Worked solutions, the patterns interviewers score on, and the failure modes that lose the round. Part of the our data engineer interview prep hub.
What the SQL round actually tests
The patterns are consistent across companies. The depth and the level of follow-up isn't. Frequencies are approximate, weighted by what shows up in verified interview reports.
| Pattern | Roughly how often | Where it lands hardest |
|---|---|---|
| INNER and LEFT JOIN | Highest | Every loop, all levels |
| GROUP BY with HAVING | Highest | Every loop, all levels |
| Window functions (PARTITION BY) | High | L4 and up, FAANG |
| Date trunc and date math | High | Every loop |
| ROW_NUMBER deduplication | High | Every loop |
| NULLIF and COALESCE patterns | Common | Every loop |
| RANK and DENSE_RANK | Common | L4 and up |
| Self join on inequality | Common | L4 and up, FAANG |
| Gap-and-island problems | Common at senior | FAANG, modeling-heavy roles |
| Rolling and moving averages | Common | Analytics-heavy roles |
| Pivot with conditional aggregation | Common | Every loop |
| Recursive CTE | Occasional | Senior, modeling-heavy roles |
| EXISTS vs IN | Occasional | Optimization rounds |
| EXPLAIN plan reading | Senior signal | Optimization rounds at L5 and up |
Five Worked Solutions From Real Loops
Each problem appeared in at least three reported interview loops in the last 18 months. Every solution shows the query, the gotchas the interviewer is testing for, and the follow-up question they will ask next.
Keep the most recent row per user
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_timestamp DESC, event_id DESC
) AS rn
FROM event_data
)
SELECT user_id, event_timestamp, event_id, payload
FROM ranked
WHERE rn = 1;Find users active for 3+ consecutive days
WITH daily AS (
SELECT DISTINCT user_id, substr(event_timestamp, 1, 10) AS event_date
FROM event_data
),
grouped AS (
SELECT
user_id,
event_date,
julianday(event_date) -
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_date)
AS streak_key
FROM daily
)
SELECT user_id, MIN(event_date) AS streak_start, COUNT(*) AS streak_days
FROM grouped
GROUP BY user_id, streak_key
HAVING COUNT(*) >= 3;7-day rolling revenue with explicit partial window handling
WITH daily_revenue_summary AS (
SELECT
substr(transaction_date, 1, 10) AS order_date,
SUM(total_amount) AS daily_revenue
FROM transactions
GROUP BY substr(transaction_date, 1, 10)
)
SELECT
order_date,
daily_revenue,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY order_date) >= 7
THEN AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
END AS rolling_7d_avg
FROM daily_revenue_summary
ORDER BY order_date;Pairs of employees who share a manager
SELECT e1.employee_id AS employee_a, e2.employee_id AS employee_b, e1.manager_id FROM employees e1 JOIN employees e2 ON e1.manager_id = e2.manager_id AND e1.employee_id < e2.employee_id;
All reports under a manager, including indirect
WITH RECURSIVE org AS ( SELECT employee_id, manager_id, emp_name, 0 AS depth FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.employee_id, e.manager_id, e.emp_name, o.depth + 1 FROM employees e JOIN org o ON e.manager_id = o.employee_id WHERE o.depth < 20 ) SELECT employee_id, emp_name, depth FROM org WHERE depth > 0;
What Interviewers Watch For (And Score You On)
- 01
Explicit NULL handling
Every JOIN, every WHERE, every aggregate. Saying 'COUNT ignores NULL but COUNT(*) does not' before being asked is a strong signal. Most candidates think about NULL only after a wrong answer. - 02
Grain awareness
Before the first SELECT, state the grain of your output: 'one row per user per day'. This single sentence tells the interviewer you understand aggregation. Skipping it forces them to ask, which costs points. - 03
Window vs aggregate choice
If the result needs a per-row context (running total, rank, lag), the answer is a window function. If the result collapses rows, it is GROUP BY. Mixing the two without a clear reason is the most common L4+ rejection signal. - 04
ROW_NUMBER vs RANK vs DENSE_RANK
ROW_NUMBER assigns 1, 2, 3, 4. RANK assigns 1, 1, 3, 4 (skips). DENSE_RANK assigns 1, 1, 2, 3 (no skip). Pick based on whether ties matter for your use case. Saying which you chose and why is required for senior roles. - 05
Edge cases stated proactively
Empty table, all NULL inputs, single-row groups, ties at the boundary, partial windows. Saying 'on an empty input this returns an empty set, which I think is correct here' is the experience signal interviewers are looking for. - 06
Query readability
CTEs over nested subqueries. Lowercase columns, uppercase keywords. One CTE per logical step with a name that reads like English. Interviewers explicitly score readability for L5+ roles. - 07
Performance instinct
If a problem says 'over 5 billion rows', mention partition keys, predicate pushdown, and avoiding cross joins. You do not need to optimize. You need to show that performance is on your radar.
Why GROUP BY beats window functions when both fit
Most candidates rush to window functions because they look senior. They mostly aren't. If the problem reduces to one row per group with no per-row context, GROUP BY is correct, faster to write, and easier to defend.
Use the simplest construct that produces the right answer. Window functions earn their complexity when you need a value from a non-aggregated row alongside an aggregate: last login alongside total logins, current order amount alongside the previous one. The interviewer wants the right tool, not the fanciest. The same judgment shows up in vanilla Python patterns interviewers test, where reaching for pandas on a problem that needs a dict is the equivalent failure mode.
Know the patterns before the interviewer asks them.
How the SQL Round Connects to the Rest of the Loop
SQL fluency is the entry condition for the rest of the loop, but the patterns recur in surprising places. Gap-and-island problems reappear inside vanilla Python patterns interviewers test as sessionization questions. ROW_NUMBER deduplication shows up again inside star schema and SCD round prep when you argue for SCD Type 2 over Type 1. EXPLAIN-plan reading and partition pruning are the natural lead-in to system design framework for data engineers, where the same instincts scale up to data warehouse design.
Take-home assignments often bundle SQL with a small modeling task, which is why the data engineer take-home prep overlaps here. And if you're targeting a specific company, the live SQL round at Stripe leans hard on financial-precision SQL while the loop at Airbnb tests experimentation-platform SQL. Read the company-specific guides after this one.
Tools and Technologies Tested in the SQL Round
The SQL dialect of the round usually mirrors the company stack. Snowflake-heavy companies test Snowflake-specific functions (QUALIFY, FLATTEN). BigQuery-heavy companies test ARRAY_AGG and STRUCT manipulation. Postgres companies stay portable. The underlying patterns transfer, but the syntax does not. If you already know the company stack, drill the right dialect: see BigQuery internals interview prep, Redshift internals interview prep, PostgreSQL internals interview prep, or our SQL question reference for Data Engineer interviews reference for the full set.
How to Prepare in Four Weeks
- 01
Week 1: Joins and aggregation grain
30 problems on INNER, LEFT, FULL, and self join. Every problem requires you to state the output grain before writing. Use the joins-practice and group-by-practice problem sets on DataDriven. Goal: medium problems in under 12 minutes. - 02
Week 2: Window functions end to end
Every window function: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTILE, SUM OVER, AVG OVER. 25 problems. Practice ROWS vs RANGE frame clauses. Goal: explain partial-window behavior before the interviewer asks. - 03
Week 3: Patterns and edge cases
Gap and island, deduplication, sessionization, top-N per group, percentile calculations, pivot with conditional aggregation. 20 problems. Time yourself: medium under 15 minutes, hard under 25. - 04
Week 4: Live mock interviews
10 mock interview rounds with the SQL mock interview tool on DataDriven. Speak out loud the entire time, including state-the-grain, state-edge-cases, and state-the-trade-off. Record yourself and replay. The gap between silent thinking and verbal thinking is what gets candidates rejected.
SQL Round FAQ
What SQL dialect is used in data engineering interviews?+
How long should I take to write a medium SQL problem?+
Can I use ChatGPT to practice SQL interview questions?+
Do I need to memorize SQL window functions?+
What if I do not finish the SQL problem in time?+
Are SQL questions different at FAANG vs startups?+
Should I use CTEs or subqueries?+
How important is performance in a SQL interview?+
Drill the round against a real grader
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition
More data engineer interview prep reading
More data engineer interview prep guides
JSON flattening, sessionization, and vanilla-Python data wrangling in the Data Engineer coding round.
Star schema, SCD Type 2, fact-table grain, and how to defend a model against pushback.
Pipeline architecture, exactly-once semantics, and the framing that gets you to L5.
STAR-D answers tailored to data engineering, with example responses for impact and conflict.
What graders look for in a 4 to 8 hour Data Engineer take-home, with a rubric breakdown.
How to think out loud, handle silence, and avoid the traps that sink fluent coders.