SQL is the single most-tested domain. These 40 cover joins, aggregation, window functions, CTEs, recursive queries, optimization, and dialect-specific tricks.
Q1 · L3
INNER vs LEFT vs FULL OUTER JOIN
INNER returns matched rows. LEFT keeps all left rows, NULL on no match. FULL keeps all rows on both sides. Most candidates know this; fluent candidates volunteer the row-count expectations.
Q2 · L3
GROUP BY with HAVING vs WHERE
WHERE filters rows before aggregation. HAVING filters groups after. WHERE supports indexes; HAVING does not. Move conditions to WHERE when they don't reference aggregates.
Q3 · L3
Find duplicate rows
GROUP BY all columns, HAVING COUNT > 1. NULL caveat: equality is unknown for NULL, so duplicate-with-NULL hides. COALESCE if NULL means same.
Q4 · L3
Second highest salary
DENSE_RANK to handle ties. SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) rk) WHERE rk = 2.
Q5 · L3
Count rows per group
GROUP BY group_col, SELECT COUNT(*). For unique counts: COUNT(DISTINCT col). Beware: COUNT(col) ignores NULL, COUNT(*) does not.
Q6 · L3
Sort with NULLS FIRST or LAST
ORDER BY col NULLS LAST. Postgres and Snowflake support directly. MySQL needs ORDER BY col IS NULL, col.
Q7 · L4
Deduplicate keeping latest per user
ROW_NUMBER PARTITION BY user_id ORDER BY ts DESC, filter rn = 1. Better than DISTINCT for keeping all columns.
Q8 · L4
Month-over-month growth percentage
DATE_TRUNC, SUM, LAG. (current - previous) / NULLIF(previous, 0) * 100. Volunteer NULLIF for first month.
Q9 · L4
Users active 3+ consecutive days
Gap-and-island. ROW_NUMBER per user minus date. Same diff = same streak. GROUP BY user, streak_key, HAVING COUNT >= 3.
Q10 · L4
Top N per group with ties
DENSE_RANK PARTITION BY, filter rk <= N. Explain DENSE_RANK over RANK over ROW_NUMBER.
Q11 · L4
7-day rolling average
AVG OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). Volunteer partial-window edge case.
Q12 · L4
Self join: same-manager pairs
JOIN e1, e2 ON e1.mgr = e2.mgr AND e1.id < e2.id. Inequality eliminates self-pairs and reverse duplicates.
Q13 · L4
Pivot rows to columns
SUM(CASE WHEN type = X THEN val) AS x_total. Discuss vs PIVOT operator for portability.
Q14 · L4
EXISTS vs IN performance
EXISTS short-circuits. IN materializes. EXISTS for large or correlated; IN for small literals.
Q15 · L4
COALESCE vs CASE WHEN NULL
COALESCE returns first non-null. Cleaner than CASE for default-value substitution. Some dialects (Spark) prefer NVL or IFNULL.
Q16 · L4
DATE_TRUNC vs EXTRACT vs DATE_PART
TRUNC zeros lower units (returns date). EXTRACT pulls a number (year, month). PART aliases vary by dialect. State the dialect before answering.
Q17 · L4
UNION vs UNION ALL
UNION dedupes (expensive sort). UNION ALL keeps duplicates (no sort). Use ALL unless dedup is required.
Q18 · L4
ANTI JOIN with NOT EXISTS
SELECT * FROM left WHERE NOT EXISTS (SELECT 1 FROM right WHERE join). Faster than LEFT JOIN + WHERE right.col IS NULL on most engines.
Q19 · L4
Find rows with duplicate composite key
GROUP BY composite key, HAVING COUNT > 1. To return the duplicate rows themselves: WHERE (key1, key2) IN (SELECT ... HAVING COUNT > 1).
Q20 · L4
Conditional aggregation by year
SUM(CASE WHEN year = 2024 THEN rev END). Cleaner than separate WHERE-filtered subqueries when you need multiple year totals.
Q21 · L5
Recursive CTE for org chart
Base case: WHERE id = root. Recursive: JOIN cte ON cte.id = emp.mgr. Add depth column with WHERE depth < 20 to prevent cycles.
Q22 · L5
Sessionization with 30-min gap
LAG to get previous event ts. CASE WHEN gap > 30 min OR user changed THEN 1 ELSE 0 AS new_session. SUM new_session OVER PARTITION BY user gives session_id.
Q23 · L5
Median with PERCENTILE_CONT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY val). Discuss PERCENTILE_DISC vs CONT. For huge data, mention APPROX_PERCENTILE.
Q24 · L5
Funnel: A then B within 7 days
Self-join on user, WHERE event_a.ts < event_b.ts AND event_b.ts <= event_a.ts + 7 days. Discuss vs window-function approach.
Q25 · L5
Forward-fill NULL per user
LAST_VALUE(val IGNORE NULLS) OVER (PARTITION BY user ORDER BY ts ROWS UNBOUNDED PRECEDING). Some dialects need workaround.
Q26 · L5
Detect change-points
LAG to compare current to previous. CASE WHEN current != previous THEN 1 AS is_change. SUM is_change OVER ORDER BY ts gives change-id.
Q27 · L5
EXPLAIN plan reading
Identify full table scan vs index seek. Function in WHERE prevents pushdown. Cover partition pruning, broadcast vs shuffle joins.
Q28 · L5
Skew handling in JOINs
Identify hot keys with COUNT GROUP BY. Salt with mod-N suffix on both sides. Aggregate, then unsalt. Trade-off: extra shuffle cost.
Q29 · L5
ROWS vs RANGE in window frames
ROWS: physical rows. RANGE: groups ties. Different result on duplicate timestamps. Always specify; defaults vary.
Q30 · L5
QUALIFY for window-function filtering
Snowflake/BigQuery shortcut. SELECT ... QUALIFY ROW_NUMBER() OVER (...) = 1. Replaces the CTE-and-filter pattern. Mention dialect support.
Q31 · L5
MERGE / UPSERT for slowly-changing data
MERGE target USING source ON key WHEN MATCHED AND data differs THEN UPDATE WHEN NOT MATCHED THEN INSERT. Postgres uses INSERT ON CONFLICT.
Q32 · L5
Pivot with dynamic columns
Most engines need static column list. For truly dynamic: build SQL string, EXECUTE IMMEDIATE. Snowflake PIVOT supports ANY ORDER BY for dynamic.
Q33 · L5
Lateral / CROSS APPLY for row-correlated subquery
LATERAL JOIN evaluates subquery per row. Useful for top-N-per-group when window functions are awkward. CROSS APPLY in MSSQL.
Q34 · L5
Date dimension generation
Generate from CTE: SELECT DATE '2020-01-01' + n FROM generate_series(0, 3650). For Snowflake: GENERATOR + ROW_NUMBER. Used to fill date gaps in reports.
Q35 · L5
Approximate count distinct (HLL)
APPROX_COUNT_DISTINCT (Snowflake), HLL_COUNT (BigQuery). Constant memory regardless of cardinality. ~2% error. Use when exact count is not required.
Q36 · L5
JSON parsing in SQL
Postgres ->, ->>, jsonb. BigQuery JSON_EXTRACT. Snowflake :, get_path. State dialect first. Mention shredding strategy for analytics on JSON-heavy tables.
Q37 · L5
Array operations: UNNEST and ARRAY_AGG
UNNEST explodes array to rows. ARRAY_AGG aggregates rows to array. Common in BigQuery and Postgres for one-to-many flattening.
Q38 · L6
Materialized view vs result cache vs incremental table
MV: precomputed, scheduled refresh. Cache: query-string match, free. Incremental: append-only with merge. Trade-off: freshness vs cost vs flexibility.
Q39 · L6
Time travel and zero-copy clones (Snowflake)
Time travel: query historical state via AT (TIMESTAMP). Zero-copy clone: instant snapshot, COW storage. Used for non-prod testing without storage cost.
Q40 · L6
Iceberg vs Delta vs Hudi
All ACID over object storage. Iceberg: open spec, multi-engine. Delta: Databricks-native, best Spark integration. Hudi: best CDC support. Most companies converge on Iceberg in 2026.