Data Engineer Coding Interview Practice

A DE coding interview isn't 'write a query.' It's: receive a vague prompt, ask the right clarifying questions, propose an approach, write code while answering follow-ups, defend the result. Practicing problems in solitude builds part of the skill. The other part is the interruption-and-defend loop. The mock interview simulator runs that loop end to end.

A DE coding interview isn't 'write a query.' It's: receive a vague prompt, ask the right clarifying questions, propose an approach, write code while answering follow-ups, defend the result. Practicing problems in solitude builds part of the skill. The other part is the interruption-and-defend loop. The mock interview simulator runs that loop end to end.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a SQL query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1SELECT user_id,
2 COUNT(*) AS sessions
3FROM events
4WHERE ts >= NOW() - INTERVAL '7 day'
5
Execute your solution0.4s avg.
MicrosoftInterview question
Solve a problem
4
Mock round types
3
Seniority levels: mid, senior, staff
5
Dimensions in every verdict
Unlimited
Free mock sessions

What a mock session actually looks like

Abridged transcript from a mock SQL round at mid-level rubric. The interruption pattern is the part most candidates underestimate.

Mock SQL round, mid-level rubric, abridged transcript
AI Interviewer: Here's the table. orders (order_id, customer_id, total, ordered_at). Pull the most engaged customers in the last 30 days.
You: What does "engaged" mean here? Order count? Total spend? Frequency?
AI: Order count for now. Top 10.
You: Got it. ROW_NUMBER would be wrong if there's a tie at 10th. RANK? DENSE_RANK?
AI: If 2 customers tie at 10th, return them both. So including ties.
You: [typing]
SELECT customer_id, COUNT(*) AS n_orders
FROM orders
WHERE ordered_at >= NOW() - INTERVAL '30 days'
GROUP BY customer_id
...
AI: [mid-typing] Quick check: what if a customer has zero orders in the window?
You: They won't show. The prompt is "most engaged" so zero-order customers are out by definition. Confirm?
AI: Correct, continue.
Verdict:Pass. Clarification on "engaged" was the highest-leverage moment. Pushback on the ties question landed correctly. One nit: pre-empted the zero-orders edge case yourself, which scored above the rubric's mid-level expectation.

Mock session, minute by minute

A 45-minute SQL round with the points where the AI interjects, the user thinks, and the verdict crystallizes.

A 45-minute mock SQL round, minute by minute
0:00Prompt loads. You see schema, prompt, editor. Timer starts.
0:30AI asks: 'Any clarifying questions before you start?' The honest answer is yes.
1:00Clarification exchange. The AI confirms scope and a couple of edge cases.
3:00You propose an approach in 2 sentences before writing code.
5:00Typing starts. AI sends an interrupt: 'Why ROW_NUMBER over RANK here?'
6:00You answer in 1 sentence, keep typing.
12:00First submission. Grader runs 10 seeds. Returns 9/10 pass.
13:00AI: 'Look at the seed that failed. What's the bug?'
15:00You spot the missing tiebreaker, fix, resubmit, 10/10 pass.
17:00AI changes a requirement: 'Now make it return the top 3 per region, including ties at 3rd.'
25:00You finish the rewrite. The grader passes. AI asks why DENSE_RANK over RANK.
30:00AI asks a stretch: 'What if the table is 1B rows? Index strategy?'
38:00Discussion of indexes, partitioning, materialized views.
42:00AI: 'Anything you'd change about your first answer in retrospect?'
44:00You name 1 thing. AI accepts. Timer expires.
45:00Verdict: structured score across 5 dimensions, specific moments cited.

4 round types, with how the rubric is weighted

Each round runs the surface (SQL editor, Python sandbox, Spark sandbox, design canvas) and applies its own rubric weights. The weights are calibrated to interview write-ups.

SQL technical screen30-45 min · 95% of DE loops
SurfacePostgres editor
Prompt"Pull the most engaged users in the last 30 days. (Then: define engaged, then: define top, then: handle ties.)"
RubricCorrectness 35% · Communication 25% · Edge cases 20% · Speed 15% · Code style 5%
Python coding round30-45 min · 78% of DE loops
SurfacePython sandbox
Prompt"Given a stream of events with duplicates, return 1 record per natural key with the latest event_time. (Then: late-arriving events. Then: 10M record performance budget.)"
RubricCorrectness 30% · Error handling 25% · Pythonic patterns 20% · Communication 15% · Performance 10%
PySpark coding round45-60 min · 30% of loops at Spark shops
SurfaceSpark sandbox
Prompt"Join an 800M-row events table with a 2M-row users table. (Then: defend the strategy. Then: a Spark UI screenshot showing skew.)"
RubricDataFrame fluency 30% · Join strategy 25% · Skew diagnosis 20% · Spark UI reading 15% · Defense of choices 10%
Pipeline design round45-60 min · 52% of senior+ loops
SurfaceInteractive canvas
Prompt"40M clickstream events/day. Dashboard refreshed every 15 min. Design the pipeline. (Mid-round: requirement changes to 1-min latency.)"
RubricSLA match 25% · Cost reasoning 20% · Failure modes 20% · Tool fit 15% · Adapt-on-fly 20%

Verdict format, raw

// What a verdict object looks like, returned at session end.
// (UI renders prose; raw object is for reproducibility and feedback loops.)

{
  "round_type": "sql_technical",
  "level_target": "mid",
  "duration_seconds": 2664,
  "scores": {
    "correctness":    { "score": 4.5, "max": 5, "note": "passed both submissions; required 1 resubmit for tiebreaker" },
    "communication":  { "score": 4.0, "max": 5, "note": "good clarifying questions, occasional silence while typing" },
    "edge_cases":     { "score": 5.0, "max": 5, "note": "anticipated zero-order users before being asked" },
    "speed":          { "score": 3.5, "max": 5, "note": "took 12 min to first submission, target was 10" },
    "code_style":     { "score": 4.0, "max": 5, "note": "clear CTE structure, missing inline comment on tiebreaker" }
  },
  "decisive_moments": [
    "1:00: clarifying questions landed; scope locked in",
    "13:00: recognized the missing tiebreaker without hint",
    "17:00: pivoted cleanly when requirement changed mid-round"
  ],
  "drill_suggestions": [
    "Practice 3 more ranking problems with engineered ties.",
    "Time yourself on Easy SQL until first submission < 8 min.",
    "Read /sql/order-by-practice for stable-pagination patterns."
  ]
}

What the AI returns at session end. The UI renders prose; the raw object is what the catalog system uses to build the spaced-repetition queue.

Mock interview FAQ

How is this different from solo practice problems?+
2 things. First, the interaction. A real interview is a conversation; the AI mock asks follow-ups while you type, which trains the muscle of thinking out loud under pressure. Second, the verdict. The verdict object names the moments that decided the round, so the next session's prep is targeted.
Will this prep me for FAANG DE loops?+
The rubrics are calibrated to interview write-ups from Meta, Amazon, Google, Netflix, Stripe, Databricks, Snowflake. Mid-level rubric weights correctness and clean code (Amazon, Google lean here). Senior rubric weights tradeoff articulation and failure-mode naming (Meta, Stripe). Staff rubric weights adapt-on-the-fly behavior. Per-company tracks at /companies add calibration.
How often should I run a mock?+
Twice a week in the 4 weeks before your loop; once a week in the prior 4-8 weeks. Each mock is 30-60 minutes including verdict review. More than 3 a week burns out without compounding; fewer than 1 a week loses the interruption-handling muscle.
What can the AI mock NOT replicate?+
Social pressure from a real human. The voice, the eye contact, the small awkward silence when you say something wrong. The AI is close but not identical. Most candidates layer in 1-2 peer mocks (Pramp, interviewing.io, a friend in DE) in the last 2 weeks on top of the AI mocks for the social rep.
Is the mock interviewer free?+
Yes. No daily cap, no per-session charge. The platform funds itself through a separate B2B product on a different URL; the practice surface, including AI mocks, is free.
Can I rerun a mock with a different prompt?+
Yes. Each session generates a fresh prompt from the catalog. The same problem family can be re-run with different specifics so you don't memorize a single solution; the rubric stays the same.
What happens if I freeze during the mock?+
The AI notes when you go silent and how long for. The verdict explicitly flags this with the moment ('14:30 - 6 minutes of no input after the requirement change') and suggests phrases to keep momentum next time ('let me think out loud', 'I'd want to check X first'). Freezing in a mock is a free debugging cycle; freezing in a real interview costs the round.
02 / Why practice

Start a mock SQL round

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related practice