Data Engineering Interview Prep

SQL Interview Questions for Data Engineers

SQL is the most-tested skill in data engineering interviews, appearing in nearly seven out of ten interview loops. The most tested topics are aggregation, JOINs, and window functions. Organized by topic, with example questions and difficulty levels.

Every question below can be practiced with real SQL execution on DataDriven. Write the query, run it against a real database, and see if your output matches.

1. Window Functions

Difficulty: Medium-HardInterview frequency: 15.1% of SQL questions
+

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, and frame clauses. PARTITION BY is the single most common window function keyword in real DE interviews.

Q1

Write a query to find the second-highest salary in each department using window functions.

Q2

Calculate a 7-day rolling average of daily revenue using a window frame.

Q3

Use LAG to find customers whose order value decreased compared to their previous order.

Q4

Rank products by total sales within each category, handling ties with DENSE_RANK.

Q5

Use NTILE to divide employees into salary quartiles within each department.

2. CTEs & Recursive Queries

Difficulty: Medium-HardInterview frequency: High
+

Common Table Expressions for readable multi-step queries. Recursive CTEs for hierarchical data like org charts and category trees.

Q1

Write a CTE to find all employees in a management chain starting from the CEO.

Q2

Use a recursive CTE to generate a date series for the last 90 days.

Q3

Chain multiple CTEs to calculate month-over-month revenue growth rates.

Q4

Write a recursive query to find all subcategories of a given parent category.

Q5

Use a CTE to deduplicate records, keeping only the most recent entry per user.

3. JOINs & Self-Joins

Difficulty: MediumInterview frequency: 19.6% of SQL questions
+

INNER JOIN is the second most-tested SQL keyword after GROUP BY. LEFT, RIGHT, FULL OUTER, CROSS, self-joins, and inequality joins round out the category.

Q1

Write a self-join to find pairs of employees who share the same manager.

Q2

Use a LEFT JOIN to find customers who have never placed an order.

Q3

Write an inequality join to match events that occurred within 30 minutes of each other.

Q4

Join three tables to find the top-selling product in each region.

Q5

Use a CROSS JOIN to generate all possible date-product combinations, then LEFT JOIN to find missing data.

4. Aggregations & GROUP BY

Difficulty: Easy-MediumInterview frequency: 24.5% of SQL questions
+

GROUP BY is the most frequently tested SQL keyword in DE interviews. COUNT, SUM, AVG, MIN, MAX with HAVING and conditional aggregation using CASE WHEN.

Q1

Find departments where the average salary exceeds $100K using HAVING.

Q2

Calculate the percentage of orders that were returned, grouped by product category.

Q3

Use conditional aggregation (CASE WHEN inside SUM) to pivot monthly sales into columns.

Q4

Find users who made purchases in at least 3 different months.

Q5

Write a query to find the most common pair of products bought together.

5. Subqueries

Difficulty: Medium-HardInterview frequency: High
+

Scalar, correlated, and EXISTS subqueries. Critical for complex filtering and comparison logic.

Q1

Use a correlated subquery to find each employee's salary as a percentage of their department average.

Q2

Write an EXISTS subquery to find products that have never been ordered.

Q3

Use a scalar subquery to find orders placed on the day with the highest total revenue.

Q4

Write a subquery in the FROM clause to first aggregate, then filter on the aggregated result.

Q5

Use NOT EXISTS to find customers who bought product A but never bought product B.

6. NULL Handling

Difficulty: Easy-MediumInterview frequency: Medium
+

COALESCE, NULLIF, IS NULL, three-valued logic. Subtle but critical for correctness.

Q1

Write a query that correctly handles NULL in a LEFT JOIN where the right side has no match.

Q2

Use COALESCE to replace NULL values with a running total from the previous row.

Q3

Explain why WHERE column != 'value' does not return rows where column IS NULL.

Q4

Use NULLIF to avoid division-by-zero errors in a revenue-per-user calculation.

Q5

Write a query to find records where any of five columns contains a NULL value.

7. Date & Time Functions

Difficulty: MediumInterview frequency: High
+

DATE_TRUNC, DATE_DIFF, EXTRACT, interval arithmetic. Essential for time-series analysis in data engineering.

Q1

Calculate user retention rates at 1, 7, and 30 days after signup.

Q2

Use DATE_TRUNC to aggregate daily event data into weekly cohorts.

Q3

Find the gap in days between each consecutive login for every user.

Q4

Write a query to identify users who were active for at least 3 consecutive days.

Q5

Calculate the percentage of monthly active users who were also active the previous month.

8. String Functions & Pattern Matching

Difficulty: Easy-MediumInterview frequency: Medium
+

LIKE, regex, CONCAT, SUBSTRING, SPLIT_PART. Common in data cleaning and ETL pipeline questions.

Q1

Extract the domain name from a list of email addresses using string functions.

Q2

Use LIKE with wildcards to find products whose names contain a specific pattern.

Q3

Split a comma-separated tags column into individual rows.

Q4

Write a query to standardize phone numbers into a consistent format.

Q5

Use REGEXP_REPLACE to clean HTML tags from a text column.

Worked Example: 7-Day Rolling Average Revenue

Given a daily_revenue table with day and revenue, compute a 7-day trailing average for each day. Only include days that have a full 7 days of history.

SELECT
  day,
  revenue,
  ROUND(
    AVG(revenue) OVER (
      ORDER BY day
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2
  ) AS rolling_7d_avg
FROM daily_revenue
ORDER BY day;

The window frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW captures exactly 7 rows (today plus the 6 before it). AVG computes the mean across that frame. For the first 6 rows, the frame is shorter than 7 days, so the average is based on fewer values. To exclude partial windows, wrap this in a CTE and filter WHERE row_number > 6.

Expected output

 day        | revenue  | rolling_7d_avg
------------+----------+---------------
 2024-01-07 | 12400.00 |       11457.14
 2024-01-08 | 13100.00 |       11871.43
 2024-01-09 | 11800.00 |       12128.57

How to Prepare for SQL Data Engineering Interviews

SQL is the most common skill tested in data engineering interviews, appearing in the majority of loops and a significant share of phone screens. The format is predictable: you get a schema description, a business question, and 20-30 minutes to write a query. The interviewer watches you think through the problem, not just the final answer.

Start with JOINs and aggregations. These appear in every interview and are the foundation for harder topics. If GROUP BY and HAVING are second nature, you can focus your limited interview time on the harder parts of the problem.

Master window functions. Window functions are the highest-difficulty topic that appears consistently in SQL rounds. Practice until you can write ROW_NUMBER, RANK, LAG, and LEAD without looking up the syntax.

Practice under time pressure. Knowing the concept is not enough. You need to write correct SQL in under 15 minutes per question. DataDriven's timed drill mode simulates real interview pressure.

Run your code. Reading solutions is not the same as writing them. Every question on DataDriven executes against a real database so you catch syntax errors, NULL edge cases, and logic bugs before the interview.

SQL Interview Questions FAQ

What SQL topics are most commonly tested in data engineering interviews?+
Based on DataDriven's analysis of verified interview data, the three most commonly tested SQL topics are aggregation, JOINs, and window functions. GROUP BY and INNER JOIN are the two most frequent individual keywords. Most interviews test 2-3 of these topics in a single 45-minute round.
How many SQL questions should I practice before a data engineering interview?+
Most successful candidates practice 50-100 questions across all topics. Quality matters more than quantity. Focus on understanding patterns rather than memorizing solutions. If you can solve a medium-difficulty window function problem in under 10 minutes, you are in good shape.
What is the difference between SQL questions for data engineers vs. data analysts?+
Data engineering SQL questions emphasize performance-aware queries, CTEs for pipeline logic, window functions for complex transformations, and handling edge cases (NULLs, duplicates, time zones). Analyst questions lean more toward reporting: simple aggregations, filtering, and basic JOINs.
Do data engineering interviews use LeetCode-style SQL?+
Not usually. LeetCode SQL problems are puzzle-oriented. Data engineering interviews test practical patterns: pipeline transformations, data quality checks, incremental processing, and multi-step queries. The difficulty is comparable, but the style is different. Practice with problems modeled on real data engineering work.

Practice These SQL Questions Now

Write real SQL. Get real results. Know exactly where you stand before the interview.