System Design Practice Problems

Q: Do I draw the diagram or read a worked example?

You draw. The page renders an interactive canvas with named components (ingestion, transform, storage, serving) and tool choices for each. The grader scores against the rubric in the section above. Reading worked breakdowns is useful before the first canvas attempt; it isn't a substitute for drawing.

Q: How does this differ from HelloInterview or ByteByteGo?

HelloInterview and ByteByteGo are excellent text + diagram resources. Their canvas is a paid upsell or a separate product. The wedge here is that the canvas is the primary surface and the grader gives a per-dimension verdict immediately. Use both: read HelloInterview to absorb the patterns, then practice on the canvas.

Q: Are these prompts data-engineering-specific?

Yes. The 8 families are pipeline-shaped from real DE design rounds. They're not 'design Twitter' or 'design Instagram'; they're 'design a CDC pipeline', 'design a clickstream-to-dashboard pipeline'. General SWE system design uses a different rubric and prompt set.

Q: Is there 1 correct design per prompt?

No. The rubric is 'design matches constraints', not 'design matches reference'. For a 15-min freshness prompt, both a dbt-incremental design and a micro-batch Spark design can pass if the cost band, failure modes, and tool fit are addressed. The grader is explicit about which designs pass and why.

Q: Should I bring up cost on my own?

Yes, briefly, and after the design is sketched. Bringing up cost too early reads as cost-anxiety; never bringing it up reads as inexperience. The senior move is 1 sentence after the sketch: 'I'd estimate this at low hundreds a month at the stated volume; if cost is constrained, the next move is X.' The rubric rewards this move specifically.

Q: How many design problems should I solve?

8-12 problems is enough for a mid-level DE design round. 20-30 with senior-rubric calibration for senior or staff loops. Volume matters less than recognizing prompt shapes within the first minute; the 8 families above cover the recognizable shapes.

Reading worked breakdowns of system designs teaches you what good looks like. Producing a good design under live pressure with someone watching is a different skill. The canvas below is the practice surface for the second skill: drop components, wire them, pick the tools, defend the SLA. The grader scores against rubric dimensions; multiple correct designs exist for each prompt.

Open the canvas Read the round breakdown

Problem families

Rubric dimensions

Interactive

Canvas, not screenshots

Free, no signup

How the rubric scores a design

6 dimensions, each with the move that earns the dimension and the move that loses it. Designs that hit 4-5 dimensions pass; designs that miss SLA match fail regardless of other strengths.

SLA match25%

Pass

Picked a model (batch / micro-batch / streaming) that matches the freshness number stated in the prompt

Fail

Reached for streaming because 'real-time sounds impressive' when 15-min batch satisfies the SLA

Cost band20%

Pass

Estimated low-hundreds / low-thousands monthly cost band; stated when the design would need to change to hit a tighter band

Fail

Over-provisioned with no acknowledgment of cost; or never mentioned cost at all

Failure modes named20%

Pass

Surfaced late data, schema drift, replay, backpressure relevant to the prompt BEFORE being asked

Fail

Only addressed failure modes when the interviewer prompted; or hand-waved 'we'll handle retries'

Delivery semantics15%

Pass

Picked at-least-once + idempotent (default), exactly-once (when forced), at-most-once (rarely), and stated why

Fail

Said 'exactly-once' without explaining the cost or how it's actually implemented

Tool choice fit10%

Pass

Tools chosen because they match the constraint, not because the candidate used them last year

Fail

Cargo-culting from a previous job: Kafka in a 1-source CSV problem; Spark for a 10MB dataset

Adapt on the fly10%

Pass

When the requirement changed mid-round, articulated which part of the design moved and why

Fail

Froze when the requirement changed; or restarted from scratch instead of adapting

8 problem families with a worked diagram

Each is a real prompt from interview write-ups. The diagram is 1 accepting design; many other shapes also pass.

01Clickstream into a warehouse

Medium

MetaSpotifyPinterest

SLA15 min freshness

Cost bandLow hundreds/mo at 40M events/day

Rubric checkDid you recognize 15 min is micro-batch, not streaming? Cheaper-equivalent design wins.

[App SDK] -> [S3 hourly] -> [dbt 15-min incr] -> [Snowflake]
                                              ↓
                                       [BI dashboard]

02CDC from a production database

Medium-Hard

StripeSquarePlaid

SLA1-2 min replication lag

Cost bandMid hundreds/mo

Rubric checkDebezium vs logical decoding vs read replica. Schema evolution plan. Backpressure when downstream pauses.

[Postgres WAL] -> [Debezium] -> [Kafka topic]
                                       ↓
                               [Schema registry]
                                       ↓
                          [Sink connector] -> [Snowflake]

03Near-real-time fraud detection

Hard

StripeBlockCoinbase

SLA<200 ms decision

Cost bandLow thousands/mo at 5K TPS

Rubric checkGenuine streaming. Feature freshness vs model latency tradeoff. Fallback when model service is down.

[Txn API] -> [Kafka] -> [Flink job]
                              ↓        ↓
                       [Feature store] [Rules engine]
                              ↓        ↓
                         [Model svc] -> decision
                              ↓
                         [Async audit log]

04Sessionization at scale

Hard

MetaAirbnbPinterest

SLA1-hr session freshness

Cost bandMid hundreds/mo

Rubric checkStateful streaming with watermarking. Late-event reconciliation. Cost of materializing session state.

[Events] -> [Kafka] -> [Flink stateful]
                                ↓
                       [Session aggregates]
                                ↓
                     [Late-data correction job]
                                ↓
                          [Warehouse]

05Daily revenue close

Medium

StripeSnowflakeSalesforce

SLA9am next morning

Cost bandLow hundreds/mo

Rubric checkIdempotent. Reconciliation across 3 source systems. The boring problem candidates fail on.

[Source A] ─┐
[Source B] ─┼─> [Staging] -> [Reconcile job]
[Source C] ─┘                       ↓
                              [Authoritative]
                                    ↓
                              [Audit + report]

06Embedded analytics in a SaaS product

Hard

SnowflakeDatabricksLooker

SLASub-second query latency

Cost bandScales with tenant skew

Rubric checkMulti-tenant isolation. Cache layer. Top 1% tenants dominate compute without isolation.

[App API] -> [Cache] -> [ClickHouse/Druid]
              ↑               ↑
        [Pre-agg per tenant]   |
                               |
              [Hot tenant isolation pool]

07Multi-region failover

Hard

NetflixStripeCloudflare

SLARPO 15 min, RTO 1 hr

Cost bandCost-of-disaster vs cost-of-readiness tradeoff

Rubric checkActive-active vs active-passive. Sync vs async replication. Backfill plan after failover (most candidates skip this).

Region A: [Primary] <-async-> [Replica]
                  ↓                          ↓
              [Reporting]                [Reporting]
Region B: [Replica] <-async-> [Standby + log]
[Failover trigger] -> promote replica -> backfill divergence

08Legacy ETL to dbt migration

Medium-Hard

StripeSnowflakeBigQuery customers

SLA3-quarter cutover, zero downstream breakage

Cost bandMigration cost vs ongoing savings

Rubric checkDependency graph extraction. Parallel-run + diff test. Rollback path. Surface divergence early.

[Legacy ETL] -> [Output A]
       ↓ extract DAG
[Equivalence test] <-diff- [dbt run] -> [Output B]
                              ↓
                    cutover when diff < threshold

Most common failure patterns across mock sessions

Drawn from rubric verdicts. The first row (defaulting to streaming) is the single biggest cause of failed mid-level design rounds.

Pattern	Dimension hit	% of failed rounds
Defaults to streaming when batch satisfies SLA	SLA match	30%
Cargo-cults Kafka into single-source pipelines	Tool choice fit	22%
Never addresses cost	Cost band	18%
Skips backfill plan in failover designs	Failure modes	15%
Freezes when requirement changes mid-round	Adapt on fly	9%
Other (semantics, partitioning errors)	Various	6%

System design practice FAQ

Do I draw the diagram or read a worked example?+

You draw. The page renders an interactive canvas with named components (ingestion, transform, storage, serving) and tool choices for each. The grader scores against the rubric in the section above. Reading worked breakdowns is useful before the first canvas attempt; it isn't a substitute for drawing.

How does this differ from HelloInterview or ByteByteGo?+

HelloInterview and ByteByteGo are excellent text + diagram resources. Their canvas is a paid upsell or a separate product. The wedge here is that the canvas is the primary surface and the grader gives a per-dimension verdict immediately. Use both: read HelloInterview to absorb the patterns, then practice on the canvas.

Are these prompts data-engineering-specific?+

Yes. The 8 families are pipeline-shaped from real DE design rounds. They're not 'design Twitter' or 'design Instagram'; they're 'design a CDC pipeline', 'design a clickstream-to-dashboard pipeline'. General SWE system design uses a different rubric and prompt set.

Is there 1 correct design per prompt?+

No. The rubric is 'design matches constraints', not 'design matches reference'. For a 15-min freshness prompt, both a dbt-incremental design and a micro-batch Spark design can pass if the cost band, failure modes, and tool fit are addressed. The grader is explicit about which designs pass and why.

Should I bring up cost on my own?+

Yes, briefly, and after the design is sketched. Bringing up cost too early reads as cost-anxiety; never bringing it up reads as inexperience. The senior move is 1 sentence after the sketch: 'I'd estimate this at low hundreds a month at the stated volume; if cost is constrained, the next move is X.' The rubric rewards this move specifically.

How many design problems should I solve?+

8-12 problems is enough for a mid-level DE design round. 20-30 with senior-rubric calibration for senior or staff loops. Volume matters less than recognizing prompt shapes within the first minute; the 8 families above cover the recognizable shapes.

02 / Why practice

Open family 1 (clickstream to dashboard)

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
System design is graded on the calls you defend out loud
Ingestion, batch vs streaming, the bronze/silver/gold layers, idempotency, backfill and replay. Sketching the pipeline and naming the failure modes is the signal, not the boxes

Open the canvas

Adjacent practice

Pipeline Architecture Practice→

DE-specific framing of the same canvas.

Mock System Design Interview→

AI interviewer + mid-round requirement changes + verdict.

DE System Design Guide→

Round breakdown, the rubric in detail, the decision framework.