SQL Interview Prep

SQL Mock Interview for Data Engineers

400+ SQL questions with real code execution. AI grading checks correctness across 12+ edge cases per question, flags performance issues, and gives line-by-line feedback on query structure.

400+
SQL Questions
34%
Window Functions Tested
12 per question
Edge Cases Tested
18 min
Avg. Solve Time

Why SQL is the highest-stakes round in every DE interview

SQL is the one skill every DE interviewer tests. You can skip Spark at some companies. You can skip Python at others. You cannot skip SQL. Every single data engineering interview loop we've tracked (275+ companies) includes at least one SQL round.

The SQL round is usually the first technical screen. If you fail it, you don't get to the on-site. If you pass it well, you build momentum and interviewer goodwill that carries into later rounds. Treating SQL prep as the highest-priority investment isn't an opinion. It's math based on how interview pipelines work.

SQL skill degrades faster than you think. Working data engineers write SQL daily, but interview SQL is different from production SQL. Production queries are often long, messy, and optimized for maintainability. Interview queries are short, precise, and optimized for demonstrating technique. If you haven't practiced interview-style SQL in 6 months, your solve time has probably doubled.

What SQL topics interviewers actually test

We analyzed questions from interviews at 275+ companies to measure how frequently each SQL topic appears. The percentages below reflect the proportion of SQL questions in our database that test each topic.

Window Functions

34%

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, and frame clauses. Window functions are the single most tested SQL topic in data engineering interviews. Interviewers use them because they separate candidates who memorized GROUP BY from those who understand how SQL processes ordered data.

Common patterns tested:
  • Running totals and rolling averages with frame clauses
  • Ranking within partitions (the classic 'second highest per group' pattern)
  • Comparing current row to previous row with LAG/LEAD
  • Percentile calculations using NTILE or PERCENT_RANK
  • Gaps-and-islands using ROW_NUMBER difference technique
Common mistake:

Using RANK when the question needs DENSE_RANK. RANK skips numbers after ties (1, 1, 3). DENSE_RANK doesn't (1, 1, 2). When the question says 'find the second highest,' filtering WHERE rank = 2 returns nothing if the top value is tied. Interviewers test this on purpose.

Joins

28%

INNER JOIN, LEFT JOIN, FULL OUTER JOIN, CROSS JOIN, self-joins, and inequality joins. Joins form the backbone of every multi-table query. Interviewers use join questions to test whether you understand NULL behavior, many-to-many relationships, and accidental row multiplication.

Common patterns tested:
  • LEFT JOIN + IS NULL to find missing records (customers with no orders)
  • Self-join to compare rows within the same table (employee pairs, consecutive events)
  • Inequality join to match events within a time window
  • CROSS JOIN to generate all combinations, then LEFT JOIN to find gaps
  • Multiple joins with careful attention to join order and cardinality
Common mistake:

Forgetting that LEFT JOIN followed by a WHERE filter on the right table converts it to an INNER JOIN. If you write LEFT JOIN orders o ON ... WHERE o.status = 'shipped', any customer with no orders gets filtered out. Move the condition into the ON clause instead.

Aggregation

22%

GROUP BY, HAVING, COUNT, SUM, AVG, MIN, MAX, and conditional aggregation with CASE WHEN. GROUP BY is the most frequently tested SQL keyword overall. Interviewers use aggregation questions to check your understanding of grouping granularity and filtered counting.

Common patterns tested:
  • Conditional aggregation: SUM(CASE WHEN ... THEN 1 ELSE 0 END) for pivoting
  • HAVING to filter groups after aggregation
  • COUNT(DISTINCT ...) for unique value counting
  • Nested aggregations: average of per-group counts
  • Percentage calculations: group count divided by total count
Common mistake:

Putting aggregate filters in WHERE instead of HAVING. WHERE filters individual rows before grouping. HAVING filters groups after aggregation. If you write WHERE COUNT(*) > 5, you get a syntax error. The correct version: HAVING COUNT(*) > 5.

CTEs

10%

Common Table Expressions for readable multi-step queries. CTEs let you break a complex query into named, sequential stages. Interviewers test whether you can decompose a problem into logical steps rather than nesting subqueries four levels deep.

Common patterns tested:
  • Multi-step transformations: aggregate, then filter, then rank
  • Recursive CTEs for hierarchical data (org charts, category trees)
  • CTE-based deduplication with ROW_NUMBER
  • Chaining CTEs for month-over-month or year-over-year comparisons
  • Date spine generation using recursive CTE
Common mistake:

Over-nesting instead of using CTEs. A three-level nested subquery is hard to read, hard to debug, and signals to the interviewer that you don't know CTE syntax. Some candidates avoid CTEs because they think they're 'slower.' In practice, most query engines optimize CTEs identically to subqueries.

Subqueries

6%

Scalar subqueries, correlated subqueries, and EXISTS/NOT EXISTS. Subqueries test whether you can think about data access at two levels simultaneously. Correlated subqueries in particular are tricky because they execute conceptually once per outer row.

Common patterns tested:
  • EXISTS/NOT EXISTS for set membership tests
  • Correlated subquery for per-row comparisons (salary vs department average)
  • Scalar subquery in SELECT for single-value lookups
  • Subquery in FROM (derived table) for pre-aggregation
  • NOT EXISTS as a performant alternative to NOT IN (handles NULLs correctly)
Common mistake:

Using NOT IN instead of NOT EXISTS when the subquery column contains NULLs. NOT IN returns no rows if any value in the list is NULL, because NULL != anything is unknown. NOT EXISTS handles this correctly. Interviewers love this gotcha.

What the AI grader evaluates on every SQL question

The grader evaluates five dimensions, matching the rubric categories that real interviewers use at companies like Meta, Google, and Amazon.

Correctness

Your query runs for real and the grader checks correctness across a range of edge cases. Partial credit counts: if you get 90% of cases right but miss a NULL edge case, you see exactly what went wrong.

Edge case handling

Each question is tested against 8 to 15 edge cases: NULL values in join columns, empty tables, duplicate timestamps, single-row groups, and tables with only one partition. You see exactly which cases your query handled and which ones it missed.

Query performance

The grader checks whether your approach scales. A self-join that scans n^2 rows when a window function would scan n once gets flagged. A DISTINCT that could be replaced with a proper GROUP BY gets noted. These aren't just style points. Interviewers at Google and Meta explicitly grade on performance awareness.

Code readability

Consistent formatting, meaningful aliases, logical CTE names, appropriate use of whitespace. The grader doesn't penalize personal style, but it does flag queries that are genuinely hard to follow: deeply nested subqueries where CTEs would be clearer, single-letter aliases on five tables, or missing indentation on complex CASE statements.

SQL best practices

Using COALESCE for NULL handling instead of ignoring NULLs. Using DATE_TRUNC instead of string manipulation on dates. Using NOT EXISTS instead of NOT IN when NULLs are possible. These patterns distinguish a data engineer who writes production SQL from one who only writes throwaway queries.

Why real SQL execution beats flashcards and multiple choice

Syntax errors teach you faster than reading docs

When your query fails with 'column must appear in GROUP BY clause or be used in an aggregate function,' you learn the rule instantly. You remember the error message because you felt the friction of fixing it. Reading about GROUP BY rules in a textbook doesn't create that memory. DataDriven runs your SQL against a production-grade SQL engine, so you get real error messages for real mistakes.

Output comparison reveals subtle bugs

Your query runs. It returns results. They look reasonable. But are they right? Without a grader, you'd never notice that your LEFT JOIN accidentally duplicated 3% of rows because of a many-to-many relationship. DataDriven's grader catches exactly these bugs and shows you what went wrong.

Production-grade SQL is what companies use

Most DE teams run production-grade SQL engines that support advanced features. Practicing on a toy SQL engine that doesn't support window frames or LATERAL joins means you're not practicing with the tools you'll use. DataDriven runs a real database that supports advanced features like FILTER, GROUPING SETS, and recursive CTEs.

Multiple correct approaches

For most SQL problems, there are 3 to 5 valid solutions. A window function approach. A self-join approach. A CTE approach. A subquery approach. When your code actually runs and the output matches, any of these counts as correct. DataDriven's grader evaluates all valid approaches, unlike platforms that only accept one hardcoded solution pattern.

The gap between knowing SQL and performing in an interview

Flashcards test recognition. Interviews test production.

A flashcard asks: 'What does ROW_NUMBER do?' You answer: 'Assigns a sequential integer to rows within a partition.' Correct. Now the interviewer asks: 'Write a query to deduplicate a table with 5 million rows, keeping the most recent entry per user.' Can you translate that knowledge into working SQL under time pressure? That translation step is what mock interviews with real execution train.

Syntax errors only happen when you write code

The most common SQL mistakes in interviews are syntax errors: missing commas, wrong function names, GROUP BY mismatches, WHERE vs HAVING confusion. You can know the concepts perfectly and still make these errors under pressure. The only way to reduce them is to write hundreds of queries and fix hundreds of errors. DataDriven gives you that repetition with real database feedback.

Debugging is a skill that requires practice

Your query returns 47 rows when the expected answer is 45. Where's the bug? Is it the join creating duplicates? A missing WHERE filter? A window function partition issue? Debugging SQL requires a systematic process: check row counts at each step, examine join cardinality, verify GROUP BY granularity. You can't practice this process without running real queries against real data.

Difficulty progression: Easy to Hard

Easy

Target: 8 to 12 minutes

Single-table aggregations, basic joins, simple filtering with WHERE and HAVING. These questions test fundamental SQL literacy. If you can't solve easy questions consistently in under 12 minutes, you need more time on fundamentals before mock interviews.

Example: Find the top 3 departments by total salary using GROUP BY and ORDER BY with LIMIT.

Medium

Target: 15 to 25 minutes

Multi-table joins, window functions, CTEs, and conditional aggregation. Most interview questions fall into this range. A medium question on DataDriven matches what you'd see in round 1 at Meta or Amazon. The time target of 15 to 25 minutes accounts for thinking, writing, and testing.

Example: Calculate a 7-day rolling average of daily revenue, handling days with no orders.

Hard

Target: 25 to 35 minutes

Multi-CTE patterns, recursive queries, complex window frames, and performance-sensitive solutions. Hard questions target senior and staff candidates. They typically require 3 or more logical steps and at least one non-obvious insight (like recognizing a gaps-and-islands pattern or using a self-join for sessionization).

Example: Identify user sessions where a session ends after 30 minutes of inactivity, then calculate the median session duration per user.

How a SQL mock interview session works on DataDriven

You start by selecting SQL as your domain and choosing a difficulty level. The simulator picks a question that matches your profile, weighted toward topics where you've historically performed worst. If your window function accuracy is 60% and your join accuracy is 85%, you'll see more window function questions.

The question appears with a schema diagram, sample data, and a clear problem statement. You get a full code editor with syntax highlighting, auto-complete, and the ability to run your query at any point during the session. Running your query doesn't count as submitting. You can iterate as many times as you need within the time limit.

When you submit, the grader checks your query against a range of edge cases: NULLs, empty tables, duplicates, and boundary values. You see which cases passed and which failed, with clear explanations of what went wrong. The grader also provides written feedback on performance, style, and best practices.

After the question, you see the optimal solution with a full walkthrough: why this approach works, what the common mistakes are, and how the query performs at scale. You also see how your solve time compares to the target for your difficulty level. All of this data feeds into your topic accuracy dashboard, which updates after every question.

Frequently asked questions

What SQL topics appear most in data engineering interviews?

Based on analysis of interviews at 275+ companies: window functions (34%), joins (28%), aggregation and GROUP BY (22%), CTEs (10%), and subqueries (6%). Window functions are the top topic because they test analytical thinking beyond basic data retrieval.

Does DataDriven run real SQL or simulate it?

Real execution. Your queries run against an actual database. You get real error messages and real output. The AI grader evaluates the actual results of your query, not a syntax pattern match.

How many SQL questions does DataDriven have?

Over 400 SQL questions across all difficulty levels. Each question is tested against 8 to 15 edge cases and includes a detailed explanation of the optimal approach, common mistakes to avoid, and complexity analysis. Questions are authored by data engineers from Meta, Google, Amazon, and Netflix.

What if my SQL answer is correct but uses a different approach than the expected solution?

The grader evaluates your actual output, not your query structure. If your results match the expected output across all test cases, your solution is marked correct regardless of approach. The grader then provides feedback on performance and readability as separate dimensions, noting if a more efficient approach exists.

How is SQL for data engineers different from SQL for analysts?

Data engineering SQL interviews go deeper on window functions, CTEs, performance, and edge case handling. Analyst SQL interviews focus more on aggregation and basic joins. DE interviews also test your awareness of query performance at scale: interviewers ask how your query would perform on 100 million rows, which is rarely a concern for analyst roles.

Write your first SQL query in under a minute

Pick a difficulty, get a real interview question, write SQL that actually runs, get AI feedback.