SQL Mock Interview for Data Engineers

400+ SQL questions with real code execution. AI evaluation checks correctness across 12+ edge cases per question, flags performance issues, and gives line-by-line feedback on query structure.

400+
SQL Questions
34%
Window Functions Tested
12 per question
Edge Cases Tested
18 min
Avg. Solve Time

Why SQL Is the Highest-Stakes Round in Every DE Interview

SQL is the one skill every DE interviewer tests. You can skip Spark at some companies. You can skip Python at others. You cannot skip SQL. Every single data engineering interview loop we have tracked (275+ companies) includes at least one SQL round.

The SQL round is usually the first technical screen. If you fail it, you do not get to the on-site. If you pass it well, you build momentum and interviewer goodwill that carries into later rounds. Treating SQL prep as the highest-priority investment is not an opinion. It is math based on how interview pipelines work.

SQL skill degrades faster than you think. Working data engineers write SQL daily, but interview SQL is different from production SQL. Production queries are often long, messy, and optimized for maintainability. Interview queries are short, precise, and optimized for demonstrating technique. If you have not practiced interview-style SQL in 6 months, your solve time has probably doubled.

What SQL Topics Interviewers Actually Test

Window Functions (34%)

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, and frame clauses. Window functions are the single most tested SQL topic in data engineering interviews. Interviewers use them because they separate candidates who memorized GROUP BY from those who understand how SQL processes ordered data. Patterns: Running totals and rolling averages with frame clauses; Ranking within partitions (the classic 'second highest per group' pattern); Comparing current row to previous row with LAG/LEAD; Percentile calculations using NTILE or PERCENT_RANK; Gaps-and-islands using ROW_NUMBER difference technique. Common mistake: Using RANK when the question needs DENSE_RANK. RANK skips numbers after ties (1, 1, 3). DENSE_RANK does not (1, 1, 2). When the question says 'find the second highest,' filtering WHERE rank = 2 returns nothing if the top value is tied. Interviewers test this on purpose.

Joins (28%)

INNER JOIN, LEFT JOIN, FULL OUTER JOIN, CROSS JOIN, self-joins, and inequality joins. Joins form the backbone of every multi-table query. Interviewers use join questions to test whether you understand NULL behavior, many-to-many relationships, and accidental row multiplication. Patterns: LEFT JOIN + IS NULL to find missing records (customers with no orders); Self-join to compare rows within the same table (employee pairs, consecutive events); Inequality join to match events within a time window; CROSS JOIN to generate all combinations, then LEFT JOIN to find gaps; Multiple joins with careful attention to join order and cardinality. Common mistake: Forgetting that LEFT JOIN followed by a WHERE filter on the right table converts it to an INNER JOIN. If you write LEFT JOIN orders o ON ... WHERE o.status = 'shipped', any customer with no orders gets filtered out. Move the condition into the ON clause instead.

Aggregation (22%)

GROUP BY, HAVING, COUNT, SUM, AVG, MIN, MAX, and conditional aggregation with CASE WHEN. GROUP BY is the most frequently tested SQL keyword overall. Interviewers use aggregation questions to check your understanding of grouping granularity and filtered counting. Patterns: Conditional aggregation: SUM(CASE WHEN ... THEN 1 ELSE 0 END) for pivoting; HAVING to filter groups after aggregation; COUNT(DISTINCT ...) for unique value counting; Nested aggregations: average of per-group counts; Percentage calculations: group count divided by total count. Common mistake: Putting aggregate filters in WHERE instead of HAVING. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. If you write WHERE COUNT(*) > 5, you get a syntax error. The correct version: HAVING COUNT(*) > 5.

CTEs (10%)

Common Table Expressions for readable multi-step queries. CTEs let you break a complex query into named, sequential stages. Interviewers test whether you can decompose a problem into logical steps rather than nesting subqueries four levels deep. Patterns: Multi-step transformations: aggregate, then filter, then rank; Recursive CTEs for hierarchical data (org charts, category trees); CTE-based deduplication with ROW_NUMBER; Chaining CTEs for month-over-month or year-over-year comparisons; Date spine generation using recursive CTE. Common mistake: Over-nesting instead of using CTEs. A three-level nested subquery is hard to read, hard to debug, and signals to the interviewer that you do not know CTE syntax. Some candidates avoid CTEs because they think they are 'slower.' In practice, most query engines optimize CTEs identically to subqueries.

Subqueries (6%)

Scalar subqueries, correlated subqueries, and EXISTS/NOT EXISTS. Subqueries test whether you can think about data access at two levels simultaneously. Correlated subqueries in particular are tricky because they execute conceptually once per outer row. Patterns: EXISTS/NOT EXISTS for set membership tests; Correlated subquery for per-row comparisons (salary vs department average); Scalar subquery in SELECT for single-value lookups; Subquery in FROM (derived table) for pre-aggregation; NOT EXISTS as a performant alternative to NOT IN (handles NULLs correctly). Common mistake: Using NOT IN instead of NOT EXISTS when the subquery column contains NULLs. NOT IN returns no rows if any value in the list is NULL, because NULL != anything is unknown. NOT EXISTS handles this correctly. Interviewers love this gotcha.

What the AI Evaluator Checks on Every SQL Question

Correctness

Your query runs for real and the evaluator checks correctness across a range of edge cases. Partial credit counts: if you get 90% of cases right but miss a NULL edge case, you see exactly what went wrong.

Edge case handling

Each question is tested against 8 to 15 edge cases: NULL values in join columns, empty tables, duplicate timestamps, single-row groups, and tables with only one partition. You see exactly which cases your query handled and which ones it missed.

Query performance

The evaluator checks whether your approach scales. A self-join that scans n^2 rows when a window function would scan n once gets flagged. A DISTINCT that could be replaced with a proper GROUP BY gets noted. These are not just style points. Interviewers at Google and Meta explicitly score on performance awareness.

Code readability

Consistent formatting, meaningful aliases, logical CTE names, appropriate use of whitespace. The evaluator does not penalize personal style, but it does flag queries that are genuinely hard to follow: deeply nested subqueries where CTEs would be clearer, single-letter aliases on five tables, or missing indentation on complex CASE statements.

SQL best practices

Using COALESCE for NULL handling instead of ignoring NULLs. Using DATE_TRUNC instead of string manipulation on dates. Using NOT EXISTS instead of NOT IN when NULLs are possible. These patterns distinguish a data engineer who writes production SQL from one who only writes throwaway queries.

Why Real SQL Execution Beats Flashcards and Multiple Choice

Syntax errors teach you faster than reading docs

When your query fails with 'column must appear in GROUP BY clause or be used in an aggregate function,' you learn the rule instantly. You remember the error message because you felt the friction of fixing it. Reading about GROUP BY rules in a textbook does not create that memory. DataDriven runs your SQL against a production-grade SQL engine, so you get real error messages for real mistakes.

Output comparison reveals subtle bugs

Your query runs. It returns results. They look reasonable. But are they right? Without an evaluator, you would never notice that your LEFT JOIN accidentally duplicated 3% of rows because of a many-to-many relationship. DataDriven's evaluator catches exactly these bugs and shows you what went wrong.

Production-grade SQL is what companies use

Most DE teams run production-grade SQL engines that support advanced features. Practicing on a toy SQL engine that does not support window frames or LATERAL joins means you are not practicing with the tools you will use. DataDriven runs a real database that supports advanced features like FILTER, GROUPING SETS, and recursive CTEs.

Multiple correct approaches

For most SQL problems, there are 3 to 5 valid solutions. A window function approach. A self-join approach. A CTE approach. A subquery approach. When your code actually runs and the output matches, any of these counts as correct. DataDriven's evaluator handles all valid approaches, unlike platforms that only accept one hardcoded solution pattern.

The Gap Between Knowing SQL and Performing in an Interview

Flashcards test recognition. Interviews test production.

A flashcard asks: 'What does ROW_NUMBER do?' You answer: 'Assigns a sequential integer to rows within a partition.' Correct. Now the interviewer asks: 'Write a query to deduplicate a table with 5 million rows, keeping the most recent entry per user.' Can you translate that knowledge into working SQL under time pressure? That translation step is what mock interviews with real execution train.

Syntax errors only happen when you write code

The most common SQL mistakes in interviews are syntax errors: missing commas, wrong function names, GROUP BY mismatches, WHERE vs HAVING confusion. You can know the concepts perfectly and still make these errors under pressure. The only way to reduce them is to write hundreds of queries and fix hundreds of errors. DataDriven gives you that repetition with real database feedback.

Debugging is a skill that requires practice

Your query returns 47 rows when the expected answer is 45. Where is the bug? Is it the join creating duplicates? A missing WHERE filter? A window function partition issue? Debugging SQL requires a systematic process: check row counts at each step, examine join cardinality, verify GROUP BY granularity. You cannot practice this process without running real queries against real data.

Difficulty Progression: Easy to Hard

Easy (Target: 8 to 12 minutes)

Single-table aggregations, basic joins, simple filtering with WHERE and HAVING. These questions test fundamental SQL literacy. If you cannot solve easy questions consistently in under 12 minutes, you need more time on fundamentals before mock interviews. Example: Find the top 3 departments by total salary using GROUP BY and ORDER BY with LIMIT.

Medium (Target: 15 to 25 minutes)

Multi-table joins, window functions, CTEs, and conditional aggregation. Most interview questions fall into this range. A medium question on DataDriven matches what you would see in round 1 at Meta or Amazon. The time target of 15 to 25 minutes accounts for thinking, writing, and testing. Example: Calculate a 7-day rolling average of daily revenue, handling days with no orders.

Hard (Target: 25 to 35 minutes)

Multi-CTE patterns, recursive queries, complex window frames, and performance-sensitive solutions. Hard questions target senior and staff candidates. They typically require 3 or more logical steps and at least one non-obvious insight (like recognizing a gaps-and-islands pattern or using a self-join for sessionization). Example: Identify user sessions where a session ends after 30 minutes of inactivity, then calculate the median session duration per user.

How a SQL Mock Interview Session Works on DataDriven

You start by selecting SQL as your domain and choosing a difficulty level. The simulator picks a question that matches your profile, weighted toward topics where you have historically performed worst. If your window function accuracy is 60% and your join accuracy is 85%, you will see more window function questions.

The question appears with a schema diagram, sample data, and a clear problem statement. You get a full code editor with syntax highlighting, auto-complete, and the ability to run your query at any point during the session. Running your query does not count as submitting. You can iterate as many times as you need within the time limit.

When you submit, the evaluator checks your query against a range of edge cases: NULLs, empty tables, duplicates, and boundary values. You see which cases passed and which failed, with clear explanations of what went wrong. The evaluator also provides written feedback on performance, style, and best practices.

After the question, you see the optimal solution with a full walkthrough: why this approach works, what the common mistakes are, and how the query performs at scale. You also see how your solve time compares to the target for your difficulty level. All of this data feeds into your topic accuracy dashboard, which updates after every question.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a problem

Frequently Asked Questions

What SQL topics appear most in data engineering interviews?+
Based on analysis of interviews at 275+ companies: window functions (34%), joins (28%), aggregation and GROUP BY (22%), CTEs (10%), and subqueries (6%). Window functions are the top topic because they test analytical thinking beyond basic data retrieval.
Does DataDriven run real SQL or simulate it?+
Real execution. Your queries run against an actual database. You get real error messages and real output. The AI evaluator scores the actual results of your query, not a syntax pattern match.
How many SQL questions does DataDriven have?+
Over 400 SQL questions across all difficulty levels. Each question is tested against 8 to 15 edge cases and includes a detailed explanation of the optimal approach, common mistakes to avoid, and complexity analysis. Questions are authored by data engineers from Meta, Google, Amazon, and Netflix.
What if my SQL answer is correct but uses a different approach than the expected solution?+
The evaluator scores your actual output, not your query structure. If your results match the expected output across all test cases, your solution is marked correct regardless of approach. The evaluator then provides feedback on performance and readability as separate dimensions, noting if a more efficient approach exists.
How is SQL for data engineers different from SQL for analysts?+
Data engineering SQL interviews go deeper on window functions, CTEs, performance, and edge case handling. Analyst SQL interviews focus more on aggregation and basic joins. DE interviews also test your awareness of query performance at scale: interviewers ask how your query would perform on 100 million rows, which is rarely a concern for analyst roles.
02 / Why practice

Write Your First SQL Query in Under a Minute

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Interview Guides