SQL Reference
Organized by interview frequency, not alphabetical order. Every syntax example includes what interviewers actually test and the mistakes that cost candidates points. Based on thousands of questions tracked on the DataDriven platform.
The most-tested SQL category. Nearly one in four SQL questions involves grouping and aggregation.
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;Interview note: Every non-aggregated column in SELECT must appear in GROUP BY. This is the most common syntax error under interview pressure.
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;Interview note: WHERE filters rows before grouping. HAVING filters groups after aggregation. Interviewers test this distinction constantly.
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END) AS returns,
ROUND(100.0 * SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END)
/ COUNT(*), 1) AS return_pct
FROM orders
GROUP BY DATE_TRUNC('month', order_date);Interview note: Replaces multiple subqueries with a single pass. Tests CASE WHEN, aggregation, and arithmetic in one query.
SELECT
COALESCE(region, 'ALL REGIONS') AS region,
COALESCE(product, 'ALL PRODUCTS') AS product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP(region, product);Interview note: ROLLUP adds subtotal and grand total rows. CUBE adds all combinations. Knowing these exist shows depth.
Second most common. Interviewers test INNER, LEFT, FULL OUTER, CROSS, and self-joins. The tricky part is predicting row count and handling NULLs.
SELECT o.order_id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;Interview note: Returns only matching rows. If a customer has no orders, they are excluded.
SELECT c.name, COALESCE(COUNT(o.order_id), 0) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;Interview note: Returns all left-table rows. Right-table columns are NULL for non-matches. Most common join type in analytics.
SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_id IS NULL;Interview note: Finds left-table rows with no right-table match. 'Customers who never ordered.' Tested in nearly every SQL interview.
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;Interview note: Table joined to itself. Common for hierarchies and comparing rows within the same table.
SELECT d.date, p.product_id
FROM date_spine d
CROSS JOIN (SELECT DISTINCT product_id FROM products) p;Interview note: Produces every combination. Used to generate a complete grid before LEFT JOINing actual data. Essential for gap-filling.
SELECT
COALESCE(a.id, b.id) AS id,
a.value AS source_a, b.value AS source_b
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id;Interview note: Returns all rows from both sides. Used for reconciliation: what is in A but not B, and vice versa.
The topic that separates mid-level from senior candidates. Window functions compute values across related rows without collapsing them.
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
-- ROW_NUMBER: always unique (1,2,3,4)
-- RANK: ties share, next skips (1,2,2,4)
-- DENSE_RANK: ties share, no skip (1,2,2,3)Interview note: ROW_NUMBER for dedup (WHERE rn = 1). RANK/DENSE_RANK for leaderboards. Know the tie-handling difference.
SELECT month, revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change
FROM monthly_revenue;Interview note: LAG looks backward, LEAD looks forward. Second arg is offset (default 1). Third arg is default for missing rows (default NULL).
SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS running_total,
AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7dInterview note: Frame clause (ROWS BETWEEN) controls which rows the window includes. This syntax trips people up most.
NTILE(4) OVER (ORDER BY total_spend DESC) AS quartileInterview note: Divides rows into N roughly equal buckets. NTILE(4) gives quartiles, NTILE(100) gives percentiles.
FIRST_VALUE(event_type) OVER (
PARTITION BY user_id ORDER BY event_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_actionInterview note: LAST_VALUE requires explicit frame to UNBOUNDED FOLLOWING. Without it, the default frame ends at CURRENT ROW. Classic trap.
Under 5% as standalone topic, but interviewers expect CTEs whenever a query gets complex. They test readability and decomposition.
WITH active_users AS (
SELECT user_id, COUNT(*) AS sessions
FROM sessions
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.name, a.sessions
FROM users u JOIN active_users a ON u.id = a.user_id;Interview note: Named temporary result. Reference it like a table. Multiple CTEs separated by commas, not multiple WITH keywords.
WITH RECURSIVE date_spine AS (
SELECT DATE '2024-01-01' AS dt
UNION ALL
SELECT dt + INTERVAL '1 day' FROM date_spine WHERE dt < '2024-12-31'
)
SELECT ds.dt, COALESCE(r.revenue, 0) AS revenue
FROM date_spine ds LEFT JOIN daily_revenue r ON ds.dt = r.day;Interview note: Anchor + UNION ALL + recursive member. Used for date spines, org charts, graph traversals. Always include termination condition.
Rarely the main topic, but woven into most complex queries. Conditional logic connects raw data to business rules.
CASE
WHEN salary >= 150000 THEN 'Executive'
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 60000 THEN 'Mid'
ELSE 'Junior'
END AS levelInterview note: Conditions evaluated top to bottom; first TRUE wins. Put most restrictive condition first.
SELECT user_id,
MAX(CASE WHEN key = 'email' THEN value END) AS email,
MAX(CASE WHEN key = 'phone' THEN value END) AS phone
FROM user_attributes GROUP BY user_id;Interview note: Manual pivot. Works in every engine. MAX as outer aggregate because each group has at most one value per key.
Almost every interview involves a time dimension. Syntax varies heavily across engines.
-- PostgreSQL / Snowflake
DATE_TRUNC('month', order_date)
-- BigQuery
DATE_TRUNC(order_date, MONTH)Interview note: Truncates to specified precision. Argument order differs between engines. Know your target.
-- PostgreSQL
CURRENT_DATE - INTERVAL '30 days'
-- SQL Server / Snowflake
DATEADD(day, -30, GETDATE())
-- DATEDIFF
DATEDIFF(day, start_date, end_date)Interview note: Syntax is completely engine-specific. INTERVAL is more readable. DATEADD/DATEDIFF are more portable across cloud warehouses.
EXTRACT(YEAR FROM order_date)
EXTRACT(DOW FROM order_date) -- day of week
EXTRACT(EPOCH FROM ts_col) -- unix timestampInterview note: DOW numbering varies: PostgreSQL Sunday = 0, others differ. Always test or clarify in interviews.
Parsing, trimming, and pattern matching for raw data ingestion.
-- ANSI: first_name || ' ' || last_name
-- MySQL/SQL Server: CONCAT(first_name, ' ', last_name)Interview note: || propagates NULL in PostgreSQL. CONCAT in MySQL treats NULL as empty string. Know the difference.
TRIM(BOTH ' ' FROM raw_input)
LOWER(email)
REPLACE(phone, '-', '')Interview note: Always TRIM and LOWER before joining on string keys. Untrimmed whitespace causes silent join failures.
-- PostgreSQL: SPLIT_PART('a.b.c', '.', 2) returns 'b'
-- REGEXP_REPLACE(phone, '[^0-9]', '', 'g') -- digits onlyInterview note: Engine-specific syntax. These appear in data cleaning questions where source data is messy.
MERGE powers incremental loading patterns. Critical for idempotent data pipelines.
MERGE INTO dim_customer AS target
USING staging AS source ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET name = source.name, updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (id, name, created_at)
VALUES (source.id, source.name, CURRENT_TIMESTAMP);Interview note: Atomic upsert. Match on key; update if exists, insert if new. PostgreSQL uses INSERT ... ON CONFLICT instead (MERGE added in v15).
INSERT INTO dim_customer (id, name, updated_at)
VALUES (1, 'Alice', NOW())
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
updated_at = EXCLUDED.updated_at;Interview note: PostgreSQL's upsert before MERGE was available. EXCLUDED refers to the row that would have been inserted.
Scalar subqueries, correlated subqueries, and EXISTS/NOT EXISTS. Know when each is the right tool.
-- Customers with no orders (NULL-safe, unlike NOT IN)
SELECT c.name FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);Interview note: EXISTS is NULL-safe (unlike NOT IN). Prefer NOT EXISTS over NOT IN when the subquery column might contain NULLs.
SELECT name, salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;Interview note: Returns one value. If it returns multiple rows, the query errors. Use for comparing each row against a global metric.
UNION, INTERSECT, EXCEPT for combining result sets.
-- UNION: removes duplicates (slower)
SELECT id FROM table_a UNION SELECT id FROM table_b;
-- UNION ALL: keeps duplicates (faster)
SELECT id FROM table_a UNION ALL SELECT id FROM table_b;Interview note: Always use UNION ALL unless you need dedup. UNION sorts and deduplicates, which is expensive. Interviewers test this.
SELECT id FROM table_a EXCEPT SELECT id FROM table_b;Interview note: Rows in A but not B. Called MINUS in Oracle. Useful for reconciliation and migration validation.
SQL does not execute top to bottom. Understanding execution order explains why you cannot use SELECT aliases in WHERE, why HAVING can reference aggregates, and why ORDER BY can use aliases. Interviewers test this directly.
A cheat sheet helps you review. Practice helps you perform. Write real queries against real tables with instant feedback and result comparison.