400+ SQL questions with real code execution. AI grading checks correctness across 12+ edge cases per question, flags performance issues, and gives line-by-line feedback on query structure.
SQL is the one skill every DE interviewer tests. You can skip Spark at some companies. You can skip Python at others. You cannot skip SQL. Every single data engineering interview loop we've tracked (275+ companies) includes at least one SQL round.
The SQL round is usually the first technical screen. If you fail it, you don't get to the on-site. If you pass it well, you build momentum and interviewer goodwill that carries into later rounds. Treating SQL prep as the highest-priority investment isn't an opinion. It's math based on how interview pipelines work.
SQL skill degrades faster than you think. Working data engineers write SQL daily, but interview SQL is different from production SQL. Production queries are often long, messy, and optimized for maintainability. Interview queries are short, precise, and optimized for demonstrating technique. If you haven't practiced interview-style SQL in 6 months, your solve time has probably doubled.
We analyzed questions from interviews at 275+ companies to measure how frequently each SQL topic appears. The percentages below reflect the proportion of SQL questions in our database that test each topic.
ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, and frame clauses. Window functions are the single most tested SQL topic in data engineering interviews. Interviewers use them because they separate candidates who memorized GROUP BY from those who understand how SQL processes ordered data.
Using RANK when the question needs DENSE_RANK. RANK skips numbers after ties (1, 1, 3). DENSE_RANK doesn't (1, 1, 2). When the question says 'find the second highest,' filtering WHERE rank = 2 returns nothing if the top value is tied. Interviewers test this on purpose.
INNER JOIN, LEFT JOIN, FULL OUTER JOIN, CROSS JOIN, self-joins, and inequality joins. Joins form the backbone of every multi-table query. Interviewers use join questions to test whether you understand NULL behavior, many-to-many relationships, and accidental row multiplication.
Forgetting that LEFT JOIN followed by a WHERE filter on the right table converts it to an INNER JOIN. If you write LEFT JOIN orders o ON ... WHERE o.status = 'shipped', any customer with no orders gets filtered out. Move the condition into the ON clause instead.
GROUP BY, HAVING, COUNT, SUM, AVG, MIN, MAX, and conditional aggregation with CASE WHEN. GROUP BY is the most frequently tested SQL keyword overall. Interviewers use aggregation questions to check your understanding of grouping granularity and filtered counting.
Putting aggregate filters in WHERE instead of HAVING. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. If you write WHERE COUNT(*) > 5, you get a syntax error. The correct version: HAVING COUNT(*) > 5.
Common Table Expressions for readable multi-step queries. CTEs let you break a complex query into named, sequential stages. Interviewers test whether you can decompose a problem into logical steps rather than nesting subqueries four levels deep.
Over-nesting instead of using CTEs. A three-level nested subquery is hard to read, hard to debug, and signals to the interviewer that you don't know CTE syntax. Some candidates avoid CTEs because they think they're 'slower.' In practice, most query engines optimize CTEs identically to subqueries.
Scalar subqueries, correlated subqueries, and EXISTS/NOT EXISTS. Subqueries test whether you can think about data access at two levels simultaneously. Correlated subqueries in particular are tricky because they execute conceptually once per outer row.
Using NOT IN instead of NOT EXISTS when the subquery column contains NULLs. NOT IN returns no rows if any value in the list is NULL, because NULL != anything is unknown. NOT EXISTS handles this correctly. Interviewers love this gotcha.
The grader evaluates five dimensions, matching the rubric categories that real interviewers use at companies like Meta, Google, and Amazon.
Your query runs for real and the grader checks correctness across a range of edge cases. Partial credit counts: if you get 90% of cases right but miss a NULL edge case, you see exactly what went wrong.
Each question is tested against 8 to 15 edge cases: NULL values in join columns, empty tables, duplicate timestamps, single-row groups, and tables with only one partition. You see exactly which cases your query handled and which ones it missed.
The grader checks whether your approach scales. A self-join that scans n^2 rows when a window function would scan n once gets flagged. A DISTINCT that could be replaced with a proper GROUP BY gets noted. These aren't just style points. Interviewers at Google and Meta explicitly grade on performance awareness.
Consistent formatting, meaningful aliases, logical CTE names, appropriate use of whitespace. The grader doesn't penalize personal style, but it does flag queries that are genuinely hard to follow: deeply nested subqueries where CTEs would be clearer, single-letter aliases on five tables, or missing indentation on complex CASE statements.
Using COALESCE for NULL handling instead of ignoring NULLs. Using DATE_TRUNC instead of string manipulation on dates. Using NOT EXISTS instead of NOT IN when NULLs are possible. These patterns distinguish a data engineer who writes production SQL from one who only writes throwaway queries.
When your query fails with 'column must appear in GROUP BY clause or be used in an aggregate function,' you learn the rule instantly. You remember the error message because you felt the friction of fixing it. Reading about GROUP BY rules in a textbook doesn't create that memory. DataDriven runs your SQL against a production-grade SQL engine, so you get real error messages for real mistakes.
Your query runs. It returns results. They look reasonable. But are they right? Without a grader, you'd never notice that your LEFT JOIN accidentally duplicated 3% of rows because of a many-to-many relationship. DataDriven's grader catches exactly these bugs and shows you what went wrong.
Most DE teams run production-grade SQL engines that support advanced features. Practicing on a toy SQL engine that doesn't support window frames or LATERAL joins means you're not practicing with the tools you'll use. DataDriven runs a real database that supports advanced features like FILTER, GROUPING SETS, and recursive CTEs.
For most SQL problems, there are 3 to 5 valid solutions. A window function approach. A self-join approach. A CTE approach. A subquery approach. When your code actually runs and the output matches, any of these counts as correct. DataDriven's grader evaluates all valid approaches, unlike platforms that only accept one hardcoded solution pattern.
A flashcard asks: 'What does ROW_NUMBER do?' You answer: 'Assigns a sequential integer to rows within a partition.' Correct. Now the interviewer asks: 'Write a query to deduplicate a table with 5 million rows, keeping the most recent entry per user.' Can you translate that knowledge into working SQL under time pressure? That translation step is what mock interviews with real execution train.
The most common SQL mistakes in interviews are syntax errors: missing commas, wrong function names, GROUP BY mismatches, WHERE vs HAVING confusion. You can know the concepts perfectly and still make these errors under pressure. The only way to reduce them is to write hundreds of queries and fix hundreds of errors. DataDriven gives you that repetition with real database feedback.
Your query returns 47 rows when the expected answer is 45. Where's the bug? Is it the join creating duplicates? A missing WHERE filter? A window function partition issue? Debugging SQL requires a systematic process: check row counts at each step, examine join cardinality, verify GROUP BY granularity. You can't practice this process without running real queries against real data.
Single-table aggregations, basic joins, simple filtering with WHERE and HAVING. These questions test fundamental SQL literacy. If you can't solve easy questions consistently in under 12 minutes, you need more time on fundamentals before mock interviews.
Multi-table joins, window functions, CTEs, and conditional aggregation. Most interview questions fall into this range. A medium question on DataDriven matches what you'd see in round 1 at Meta or Amazon. The time target of 15 to 25 minutes accounts for thinking, writing, and testing.
Multi-CTE patterns, recursive queries, complex window frames, and performance-sensitive solutions. Hard questions target senior and staff candidates. They typically require 3 or more logical steps and at least one non-obvious insight (like recognizing a gaps-and-islands pattern or using a self-join for sessionization).
You start by selecting SQL as your domain and choosing a difficulty level. The simulator picks a question that matches your profile, weighted toward topics where you've historically performed worst. If your window function accuracy is 60% and your join accuracy is 85%, you'll see more window function questions.
The question appears with a schema diagram, sample data, and a clear problem statement. You get a full code editor with syntax highlighting, auto-complete, and the ability to run your query at any point during the session. Running your query doesn't count as submitting. You can iterate as many times as you need within the time limit.
When you submit, the grader checks your query against a range of edge cases: NULLs, empty tables, duplicates, and boundary values. You see which cases passed and which failed, with clear explanations of what went wrong. The grader also provides written feedback on performance, style, and best practices.
After the question, you see the optimal solution with a full walkthrough: why this approach works, what the common mistakes are, and how the query performs at scale. You also see how your solve time compares to the target for your difficulty level. All of this data feeds into your topic accuracy dashboard, which updates after every question.
Based on analysis of interviews at 275+ companies: window functions (34%), joins (28%), aggregation and GROUP BY (22%), CTEs (10%), and subqueries (6%). Window functions are the top topic because they test analytical thinking beyond basic data retrieval.
Real execution. Your queries run against an actual database. You get real error messages and real output. The AI grader evaluates the actual results of your query, not a syntax pattern match.
Over 400 SQL questions across all difficulty levels. Each question is tested against 8 to 15 edge cases and includes a detailed explanation of the optimal approach, common mistakes to avoid, and complexity analysis. Questions are authored by data engineers from Meta, Google, Amazon, and Netflix.
The grader evaluates your actual output, not your query structure. If your results match the expected output across all test cases, your solution is marked correct regardless of approach. The grader then provides feedback on performance and readability as separate dimensions, noting if a more efficient approach exists.
Data engineering SQL interviews go deeper on window functions, CTEs, performance, and edge case handling. Analyst SQL interviews focus more on aggregation and basic joins. DE interviews also test your awareness of query performance at scale: interviewers ask how your query would perform on 100 million rows, which is rarely a concern for analyst roles.
Pick a difficulty, get a real interview question, write SQL that actually runs, get AI feedback.
The complete guide to mock interviews across all 5 data engineering domains.
Practice Python data manipulation, ETL logic, and pandas with real code execution.
50+ SQL interview questions with worked solutions, explanations, and common mistakes.