How to Fix Three Years of SQL Neglect in Four Weeks
If you spent 2022-2024 chasing Spark certifications and your window functions are rusty, here's the reactivation plan. Research suggests advanced SQL proficiency is achievable in 3-4 weeks with focused daily practice on real datasets.
Week 1: Foundations You Think You Know
GROUP BY appears in 32% of screening questions. INNER JOIN in 29%. These are the baseline. If you hesitate on WHERE vs. HAVING, or can't explain NULL behavior in LEFT JOINs, start here. Do 5 problems a day from the SQL interview question bank. Speak your reasoning out loud while writing. That habit pays off in live rounds.
Week 2: Window Functions Until They're Reflexive
ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD. Frame specifications: ROWS BETWEEN vs. RANGE BETWEEN. Write a deduplication query, a running total, and a gaps-and-islands solution every single day. By the end of the week, PARTITION BY and ORDER BY should flow without thinking.
Here's the gaps-and-islands pattern that shows up in 40%+ of hard SQL questions. Identify consecutive login streaks:
-- Gaps and islands: find consecutive login streaks
WITH islands AS (
SELECT
user_id,
login_date,
login_date - INTERVAL '1 day' * ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY login_date
) AS island_key
FROM logins
)
SELECT
user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT(*) AS streak_length
FROM islands
GROUP BY user_id, island_key
ORDER BY streak_length DESC;
If that island_key trick doesn't make immediate sense, you need more reps. The insight: subtracting a sequential ROW_NUMBER from a date collapses consecutive days into the same key. Non-consecutive days produce different keys. It's elegant, it's unintuitive the first time, and interviewers love it.
Week 3: CTEs, Self-Joins, Query Optimization
Write every query using CTEs. Practice recursive CTEs for hierarchical data. Drill self-joins for comparing rows within the same table (finding users whose spend increased month-over-month). Learn to read execution plans, at least enough to spot full table scans and missing indexes.
Week 4: Mock Interviews Under Pressure
The research is clear: most candidates don't fail because of SQL syntax. They fail because they can't connect everything under pressure and communicate their reasoning. Practice with a timer. State assumptions about the data before writing. Narrate your approach. Use the mock interview simulator if you want realistic pressure without burning a real loop.
The salary stakes are real. The gap between "can pass SQL screening" and "cannot" maps to roughly $40K annually at mid-level roles. Senior big data engineers with Spark expertise command $155K-$200K, but they have to get past the SQL screen first. A Spark cert with rusty window functions gets you rejected at step one of a process with a 2-3% conversion rate.
I've been through three waves of "the hot new thing will replace SQL." Still here. Still the screener. Still the skill that separates candidates who think in sets from candidates who think in loops. The tools change every 18 months. PARTITION BY has been the same for 20 years.
Stop grinding Spark API trivia. Open a SQL editor. Write a running total. Write a deduplication. Write a gaps-and-islands. Do it until it's boring. Then do it under a timer until it's fast. That's the data engineer interview prep SQL plan that actually matches what companies are testing in 2026.
Play the game, win the prize.