System Design Practice Problems

Reading worked breakdowns of system designs teaches you what good looks like. Producing a good design under live pressure with someone watching is a different skill. The canvas below is the practice surface for the second skill: drop components, wire them, pick the tools, defend the SLA. The grader scores against rubric dimensions; multiple correct designs exist for each prompt.

Reading worked breakdowns of system designs teaches you what good looks like. Producing a good design under live pressure with someone watching is a different skill. The canvas below is the practice surface for the second skill: drop components, wire them, pick the tools, defend the SLA. The grader scores against rubric dimensions; multiple correct designs exist for each prompt.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a system design query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1source → bronze → silver → gold
2 ingest : CDC + Kafka
3 transform : dbt + Airflow
4 serve : Snowflake
5
Execute your solution0.4s avg.
PayPalInterview question
Solve a problem
8
Problem families
6
Rubric dimensions
Interactive
Canvas, not screenshots
$0
Free, no signup

How the rubric scores a design

6 dimensions, each with the move that earns the dimension and the move that loses it. Designs that hit 4-5 dimensions pass; designs that miss SLA match fail regardless of other strengths.

SLA match25%
Pass

Picked a model (batch / micro-batch / streaming) that matches the freshness number stated in the prompt

Fail

Reached for streaming because 'real-time sounds impressive' when 15-min batch satisfies the SLA

Cost band20%
Pass

Estimated low-hundreds / low-thousands monthly cost band; stated when the design would need to change to hit a tighter band

Fail

Over-provisioned with no acknowledgment of cost; or never mentioned cost at all

Failure modes named20%
Pass

Surfaced late data, schema drift, replay, backpressure relevant to the prompt BEFORE being asked

Fail

Only addressed failure modes when the interviewer prompted; or hand-waved 'we'll handle retries'

Delivery semantics15%
Pass

Picked at-least-once + idempotent (default), exactly-once (when forced), at-most-once (rarely), and stated why

Fail

Said 'exactly-once' without explaining the cost or how it's actually implemented

Tool choice fit10%
Pass

Tools chosen because they match the constraint, not because the candidate used them last year

Fail

Cargo-culting from a previous job: Kafka in a 1-source CSV problem; Spark for a 10MB dataset

Adapt on the fly10%
Pass

When the requirement changed mid-round, articulated which part of the design moved and why

Fail

Froze when the requirement changed; or restarted from scratch instead of adapting

8 problem families with a worked diagram

Each is a real prompt from interview write-ups. The diagram is 1 accepting design; many other shapes also pass.

01Clickstream into a warehouse
Medium
MetaSpotifyPinterest
SLA15 min freshness
Cost bandLow hundreds/mo at 40M events/day
Rubric checkDid you recognize 15 min is micro-batch, not streaming? Cheaper-equivalent design wins.
[App SDK] -> [S3 hourly] -> [dbt 15-min incr] -> [Snowflake]
                                              ↓
                                       [BI dashboard]
02CDC from a production database
Medium-Hard
StripeSquarePlaid
SLA1-2 min replication lag
Cost bandMid hundreds/mo
Rubric checkDebezium vs logical decoding vs read replica. Schema evolution plan. Backpressure when downstream pauses.
[Postgres WAL] -> [Debezium] -> [Kafka topic]
                                       ↓
                               [Schema registry]
                                       ↓
                          [Sink connector] -> [Snowflake]
03Near-real-time fraud detection
Hard
StripeBlockCoinbase
SLA<200 ms decision
Cost bandLow thousands/mo at 5K TPS
Rubric checkGenuine streaming. Feature freshness vs model latency tradeoff. Fallback when model service is down.
[Txn API] -> [Kafka] -> [Flink job]
                              ↓        ↓
                       [Feature store] [Rules engine]
                              ↓        ↓
                         [Model svc] -> decision
                              ↓
                         [Async audit log]
04Sessionization at scale
Hard
MetaAirbnbPinterest
SLA1-hr session freshness
Cost bandMid hundreds/mo
Rubric checkStateful streaming with watermarking. Late-event reconciliation. Cost of materializing session state.
[Events] -> [Kafka] -> [Flink stateful]
                                ↓
                       [Session aggregates]
                                ↓
                     [Late-data correction job]
                                ↓
                          [Warehouse]
05Daily revenue close
Medium
StripeSnowflakeSalesforce
SLA9am next morning
Cost bandLow hundreds/mo
Rubric checkIdempotent. Reconciliation across 3 source systems. The boring problem candidates fail on.
[Source A] ─┐
[Source B] ─┼─> [Staging] -> [Reconcile job]
[Source C] ─┘                       ↓
                              [Authoritative]
                                    ↓
                              [Audit + report]
06Embedded analytics in a SaaS product
Hard
SnowflakeDatabricksLooker
SLASub-second query latency
Cost bandScales with tenant skew
Rubric checkMulti-tenant isolation. Cache layer. Top 1% tenants dominate compute without isolation.
[App API] -> [Cache] -> [ClickHouse/Druid]
              ↑               ↑
        [Pre-agg per tenant]   |
                               |
              [Hot tenant isolation pool]
07Multi-region failover
Hard
NetflixStripeCloudflare
SLARPO 15 min, RTO 1 hr
Cost bandCost-of-disaster vs cost-of-readiness tradeoff
Rubric checkActive-active vs active-passive. Sync vs async replication. Backfill plan after failover (most candidates skip this).
Region A: [Primary] <-async-> [Replica]
                  ↓                          ↓
              [Reporting]                [Reporting]
Region B: [Replica] <-async-> [Standby + log]
[Failover trigger] -> promote replica -> backfill divergence
08Legacy ETL to dbt migration
Medium-Hard
StripeSnowflakeBigQuery customers
SLA3-quarter cutover, zero downstream breakage
Cost bandMigration cost vs ongoing savings
Rubric checkDependency graph extraction. Parallel-run + diff test. Rollback path. Surface divergence early.
[Legacy ETL] -> [Output A]
       ↓ extract DAG
[Equivalence test] <-diff- [dbt run] -> [Output B]
                              ↓
                    cutover when diff < threshold

Most common failure patterns across mock sessions

Drawn from rubric verdicts. The first row (defaulting to streaming) is the single biggest cause of failed mid-level design rounds.

PatternDimension hit% of failed rounds
Defaults to streaming when batch satisfies SLASLA match30%
Cargo-cults Kafka into single-source pipelinesTool choice fit22%
Never addresses costCost band18%
Skips backfill plan in failover designsFailure modes15%
Freezes when requirement changes mid-roundAdapt on fly9%
Other (semantics, partitioning errors)Various6%

System design practice FAQ

Do I draw the diagram or read a worked example?+
You draw. The page renders an interactive canvas with named components (ingestion, transform, storage, serving) and tool choices for each. The grader scores against the rubric in the section above. Reading worked breakdowns is useful before the first canvas attempt; it isn't a substitute for drawing.
How does this differ from HelloInterview or ByteByteGo?+
HelloInterview and ByteByteGo are excellent text + diagram resources. Their canvas is a paid upsell or a separate product. The wedge here is that the canvas is the primary surface and the grader gives a per-dimension verdict immediately. Use both: read HelloInterview to absorb the patterns, then practice on the canvas.
Are these prompts data-engineering-specific?+
Yes. The 8 families are pipeline-shaped from real DE design rounds. They're not 'design Twitter' or 'design Instagram'; they're 'design a CDC pipeline', 'design a clickstream-to-dashboard pipeline'. General SWE system design uses a different rubric and prompt set.
Is there 1 correct design per prompt?+
No. The rubric is 'design matches constraints', not 'design matches reference'. For a 15-min freshness prompt, both a dbt-incremental design and a micro-batch Spark design can pass if the cost band, failure modes, and tool fit are addressed. The grader is explicit about which designs pass and why.
Should I bring up cost on my own?+
Yes, briefly, and after the design is sketched. Bringing up cost too early reads as cost-anxiety; never bringing it up reads as inexperience. The senior move is 1 sentence after the sketch: 'I'd estimate this at low hundreds a month at the stated volume; if cost is constrained, the next move is X.' The rubric rewards this move specifically.
How many design problems should I solve?+
8-12 problems is enough for a mid-level DE design round. 20-30 with senior-rubric calibration for senior or staff loops. Volume matters less than recognizing prompt shapes within the first minute; the 8 families above cover the recognizable shapes.
02 / Why practice

Open family 1 (clickstream to dashboard)

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Adjacent practice