SQL coding questions, not multiple choice. Open-ended write-the-query problems shaped like real data engineer interview prompts. Every submission goes to Postgres 16 and is replayed across 10 randomized seeds so a query that hard-codes IDs or omits an ORDER BY tiebreaker fails the same way a code reviewer would catch it.

A SQL coding question in a data engineer interview is open-ended. Not multiple choice. Not fill-in-the-blank. Not a one-line keyword identification. The interviewer hands you a schema (2 to 5 tables, usually denormalized like a real warehouse), a business question phrased ambiguously ("pull the most engaged users in the last 30 days"), and 15 to 30 minutes. Three scored moments matter. The clarifying question (what does "engaged" mean: order count, total spend, distinct categories). The edge-case volunteering (NULL handling, ties at the top of a ranking, partial-window rolling averages, zero-order users). The actual query.

This catalog separates SQL coding questions from textbook-style problems. Each problem reads like an interview prompt: a couple of denormalized tables, a business question, a schema panel to inspect, and a 10-seed grader that engineers the edge cases an interviewer would ask about. Patterns that compose the bulk of the bank for data engineer roles: top-N per group with an explicit composite tiebreaker, dedup latest per natural key, sessionization with LAG and SUM OVER, gap-and-island for streak detection (consecutive-active-days, longest watch session), half-open SCD2 joins with effective_from less-than-or-equal-to ts AND ts less-than effective_to, conditional aggregation pivots with FILTER (WHERE) or SUM(CASE WHEN), running and rolling windows with explicit ROWS frames, and idempotent late-arriving reconciliation with MERGE on a composite natural key.

Difficulty calibration is consistent across the 850 questions. Easy targets 5 to 10 minutes for a fluent data engineer candidate (single JOIN, GROUP BY, basic WHERE filter, COUNT or SUM). Medium targets 10 to 15 minutes (window function, CTE for readability, conditional aggregation, simple NULLIF guard). Hard targets 15 to 25 minutes (recursive CTE for org hierarchy traversal, gap-and-island for streak detection, multi-CTE sessionization with watermark handling, SCD2 half-open joins, EXPLAIN plan interpretation). The catalog distribution skews to Medium because that is where most data engineer interview questions land in 2026. The Easy tier is for SQL warm-up. The Hard tier is for L5-and-above optimization rounds and senior data engineer loops at FAANG.

Dialect coverage: 85 percent of the catalog ports cleanly across Postgres, Snowflake, BigQuery, Redshift, and MySQL 8. Postgres-specific syntax (DISTINCT ON, FILTER WHERE, RANGE BETWEEN INTERVAL, lateral joins) is tagged per problem. The Snowflake-and-BigQuery-native QUALIFY clause appears as a tagged alternative on top-N-per-group problems. The Hive and Presto LATERAL VIEW EXPLODE and UNNEST patterns appear on array-column problems. Practicing in Postgres here gives a data engineer the patterns; the syntactic translations are read-only.

SQL Coding Questions

Open-ended SQL coding questions for data engineer interview practice with multi-seed grading.

Common questions

What makes a SQL question a 'coding question' versus a textbook problem?
A coding question is open-ended: a real schema, a business prompt phrased ambiguously, 15 to 30 minutes to write a working query. A textbook problem gives you a fixed schema, a precise instruction ('use INNER JOIN to combine X and Y'), and a single correct answer. The catalog here is all coding questions, the practice surface that matches what data engineer interviewers actually ask.
How long should an Easy SQL coding question take?
5 to 10 minutes for a fluent data engineer candidate. Spending more than 10 minutes on an Easy problem usually means missing a pattern (top-N per group, dedup, basic aggregation) or over-engineering the join order. The fix is pattern recognition, not more practice volume.
What is the most common failure mode on Medium SQL coding questions?
Missing the tiebreaker on a ranking. ORDER BY metric DESC LIMIT N picks one row when two tie at Nth. The 10-seed grader engineers that tie and fails the query. The fix is ROW_NUMBER with a composite ORDER BY (metric DESC, secondary_key) or DENSE_RANK if the business wants to keep all ties at the boundary.
Do I need to handle NULLs even if the prompt does not mention them?
Yes. Data engineer interviewers fish for NULL handling. The grader engineers NULL distributions in roughly 40 percent of seeded problems. Use NULLIF to guard against division by zero. Use COALESCE chains for hierarchical defaults. Remember that NOT IN with a NULL value silently returns empty; EXISTS / NOT EXISTS are the safer pattern.
When is RANK versus DENSE_RANK versus ROW_NUMBER the right choice?
ROW_NUMBER when you need exactly one row per partition (dedup, 'latest per key'). RANK when you want top-N including ties but skipping numbers (1, 2, 2, 4). DENSE_RANK when you want top-N including ties without skipping (1, 2, 2, 3). For 'top 3 products by revenue including ties at 3rd', DENSE_RANK is almost always what the business wants.
How are these SQL coding questions sourced?
From interview reports submitted to the platform by data engineer candidates, deduplicated and rewritten so the schema and data shapes match what surfaced in real rounds without copying specific prompt text. Company tags are applied only when a report explicitly cited the company in the same question shape.
Do these SQL coding questions cover both data engineer and data scientist interviews?
Yes, with bias toward data engineer prep. Most questions appear in both loop types because the SQL bar at the question level overlaps heavily. The differences live in the surrounding rounds (data engineer loops include data modeling and pipeline architecture; data scientist loops include statistics, A/B testing, and product sense). Pure data scientist statistics questions are out of scope here.