System design rounds test your ability to architect a data platform on a whiteboard. You will draw components, discuss trade-offs (consistency vs availability, cost vs latency), estimate capacity, and defend your choices. No code. All reasoning.
This guide covers whiteboard architecture and trade-off analysis. For operational questions like debugging production failures, backfill strategies, and error handling, see our pipeline interview questions guide. For SQL interview prep, see our dedicated SQL guide. And if you need to review data modeling concepts, start there before tackling full system design.
Most candidates fail system design by jumping straight to drawing boxes and arrows. The strongest answers follow a consistent structure that demonstrates both technical depth and communication skill.
Step 1: Clarify requirements (3-5 min). Ask about data volume, freshness requirements, query patterns, and who consumes the output. "Is this a daily batch report or a real-time dashboard?" changes everything about your design.
Step 2: Draw the high-level flow (5 min). Sources on the left, storage on the right, processing in the middle. Label each component. Do not pick specific tools yet. Just name the layers: ingestion, transformation, storage, serving.
Step 3: Walk through data flow (10 min). Follow one record from source to output. Explain what happens at each stage: validation, transformation, deduplication, loading. This is where you demonstrate depth.
Step 4: Discuss failure modes (10 min). What happens when the source is down? When a transformation fails halfway? When data arrives late? Your recovery strategy matters more than your happy-path design.
Step 5: Trade-offs and alternatives (10 min). "I chose batch here because the freshness requirement is 1 hour. If that changed to 1 minute, I would swap this component for a streaming consumer." Show that your design is deliberate, not default.
Understanding batch vs streaming trade-offs and ETL vs ELT patterns is essential background for every system design conversation.
Step Time What to Do ----- ------ ------------------------------------------ 1 3-5 m Clarify: volume, latency, consumers, SLAs 2 5 m Draw: sources > ingest > process > store > serve 3 10 m Walk: trace one record end-to-end 4 10 m Failures: late data, partial fails, recovery 5 10 m Trade-offs: why this, not that, and when to switch
Print this or memorize it. Every system design answer you give should hit all five steps. The interviewer is evaluating your process as much as your architecture. Skipping Step 1 (clarification) is the single most common reason candidates receive a "no hire" signal.
The first fork in any system design answer. Interviewers test whether you can reason about this trade-off from first principles, not just recite definitions.
Data systems force you to choose. Can your dashboard show slightly stale data? Can your pipeline tolerate duplicate records temporarily? Your answer to these questions shapes the entire architecture.
Where your data lives determines how fast you can query it, how much it costs, and how flexible your schema can be. This is where many candidates go shallow. Go deep.
System design interviews expect back-of-envelope math. How many events per second? How much storage per day? What compute do you need? These numbers drive your architecture choices.
Naming Kafka or Airflow is not enough. Interviewers want to hear why you chose that component over the alternatives, and under what conditions you would choose differently.
Every architecture decision has a cost dimension. Senior candidates discuss cost trade-offs as naturally as they discuss latency trade-offs.
Practice these end to end. Set a 35-minute timer, talk through your design out loud, and draw the architecture on paper or a whiteboard. Each scenario below includes a full walkthrough that covers requirements, architecture, data flow, failure modes, and trade-offs.
These scenarios are designed to mirror the types of prompts you will encounter in real interviews at companies like Meta, Uber, Stripe, and Airbnb. For supporting fundamentals, review our data engineering interview prep guide and pipeline interview questions.
+------------+ +------------+ +----------------+
| Rider App |--->| | | |
+------------+ | Kafka |--->| Flink (stream) |---> Redis (surge)
+------------+ | (events) | | trip events |---> Ops Dashboard
| Driver App |--->| | +----------------+
+------------+ +------+-----+
|
v
+------+------+ +---------------+
| S3 Raw Lake | | |
| (Parquet) |--->| Spark (batch) |
| partitioned| | daily ETL |
| by date | +-------+-------+
+-------------+ |
v
+---------+--------+
| Warehouse |
| (fact_trips, |
| dim_drivers, |
| dim_riders, |
| dim_locations) |
+---------+--------+
|
+---------+--------+
| BI / Finance |
| Reports |
+------------------+A rider requests a trip. The rider app emits a trip_requested event to Kafka. The driver app emits trip_accepted, trip_started, and trip_completed events as the trip progresses. Each event contains a trip ID, timestamps, GPS coordinates, and user IDs.
The streaming path (Flink) consumes these events in real time to compute surge pricing signals (trip request density per geo-cell in the last 5 minutes) and feed the operations dashboard. The surge pricing output goes to Redis for sub-millisecond reads by the pricing service.
In parallel, Kafka writes to S3 as Parquet files partitioned by date and region. A daily Spark job reads the raw data, deduplicates by trip ID, joins with driver and rider dimension tables (using slowly changing dimension type 2 for profile changes), and loads the star schema into the warehouse. Finance queries the warehouse for daily revenue reconciliation.
This design uses both batch and streaming (a Kappa-lite approach where streaming handles hot-path analytics and batch handles the analytical warehouse). The alternative is pure streaming into the warehouse using a tool like Materialize or Snowflake Snowpipe. The trade-off: pure streaming simplifies the architecture but costs significantly more and makes backfills harder because you lose the batch reprocessing path.
Geo-partitioning the S3 raw data adds partition pruning benefits for regional queries but increases the number of small files. A compaction job (daily, after the main ETL) merges small Parquet files into larger ones to keep query performance high.
+----------------+
| OLTP Database | CDC (Debezium)
| (orders, |----------------------+
| users, | |
| products, | v
| reviews) | +-------+-------+
+----------------+ | Kafka Topics |
| (per table) |
+-------+-------+
|
+---------------+---------------+
v v
+-------+-------+ +---------+---------+
| S3 Raw Lake | | S3 Raw Lake |
| /orders/ | | /reviews/ |
| dt=YYYY-MM-DD | | dt=YYYY-MM-DD |
+-------+-------+ +---------+---------+
| |
+---------------+---------------+
|
v
+-------+--------+
| dbt (transform)|
| staging > |
| intermediate > |
| marts |
+-------+--------+
|
v
+-------+--------+
| Warehouse |
| fact_order_items|
| dim_buyers |
| dim_sellers |
| dim_products |
| dim_dates |
| bridge_reviews |
+----------------+Debezium captures row-level changes from the OLTP database and publishes them to Kafka topics (one per source table). A Kafka Connect S3 sink writes these events to the raw lake as Parquet files, partitioned by date.
The fact table grain is one row per order line item. Each row contains the order ID, line item sequence, product ID, buyer ID, seller ID, quantity, unit price, currency code, and the converted USD amount (using a currency rate snapshot table). This grain supports both "total GMV by seller" and "average order value by category" queries without additional joins.
Reviews use a bridge table because one order can have multiple reviews (one per item) and one product can have reviews from multiple orders. The bridge table connects fact_order_items to dim_reviews and enables queries like "average review score for products sold by seller X." Seller dimensions use SCD type 2 to track changes in seller tier, verification status, and store name over time.
rate_imputed boolean.Star schema vs fully normalized: a star schema denormalizes for read performance at the cost of storage and update complexity. For an analytics warehouse, this is the right choice because queries run 5-20x faster with fewer joins. The alternative (normalized 3NF) is better suited for OLTP workloads where write efficiency matters. State this trade-off explicitly. Also consider whether to use Kimball-style dimensional modeling or a Data Vault approach. For marketplaces with rapidly changing business rules, Data Vault provides more flexibility but adds complexity in the transformation layer.
+-----------+ +-----------+ +------------------+
| Payment |---->| Kafka |---->| Flink |
| Gateway | | (txn | | Feature Engine |
+-----------+ | events) | | (velocity, geo, |
+-----+-----+ | merchant risk) |
| +--------+---------+
| |
| v
| +--------+---------+
| | Feature Store |
| | (Redis, <5ms) |
| +--------+---------+
| |
| v
| +--------+---------+
| | Model Service |
| | (fraud score) |----> BLOCK / ALLOW
| | A/B router | (< 200ms)
| +--------+---------+
| |
v v
+-----+-----+ +--------+---------+
| S3 Raw | | Labeled Outcomes |
| (all txn) | | (chargebacks, |
+-----+-----+ | disputes) |
| +--------+---------+
v |
+-----+-----+ v
| Batch | +--------+---------+
| Retrain |<----| Training Dataset |
| Pipeline | | (features + |
+-----+-----+ | labels joined) |
| +------------------+
v
+-----+-----+
| Model |
| Registry |
+-----------+A transaction event arrives from the payment gateway into Kafka. The Flink feature engine consumes it and computes real-time features: number of transactions from this card in the last 10 minutes (velocity), geo-distance from the previous transaction, and merchant category risk score (looked up from a static table in Redis).
These features are written to the feature store (Redis) and simultaneously passed to the model service. The model service runs the fraud scoring model (a gradient-boosted tree or a neural network, depending on the company) and returns a score between 0 and 1. Transactions scoring above 0.85 are blocked. Between 0.6 and 0.85, they are flagged for manual review. Below 0.6, they pass through. The total end-to-end latency budget is 200ms.
For A/B testing new models, the model service uses a routing layer that assigns each transaction to a model variant based on a hash of the card ID. Both models score the transaction, but only the assigned variant's decision is enforced. The other score is logged for offline comparison. This prevents peeking bias because the assignment is deterministic and the non-enforced model cannot influence the outcome.
The fundamental trade-off is false positives vs false negatives. Blocking a legitimate $500 transaction costs the company revenue and customer trust. Missing a $500 fraudulent transaction costs a chargeback fee plus the transaction amount. The threshold between BLOCK and ALLOW must be tuned to the business's risk appetite, and different merchant categories may warrant different thresholds. This is a conversation the interviewer wants to hear you initiate, not wait to be asked about.
+------------+ +----------+ +----------+ +----------+
| Salesforce | | Zendesk | | Stripe | | 47 more |
+-----+------+ +----+-----+ +----+-----+ +----+-----+
| | | |
v v v v
+-----+--------------+------------+------------+-----+
| Ingestion Layer (Fivetran / Airbyte) |
| Per-source connectors, rate limit handling, |
| incremental sync where API supports cursors |
+----------------------------+-----------------------+
|
v
+----------+----------+
| S3 Raw Lake |
| /source=salesforce/ |
| /source=zendesk/ |
| /source=stripe/ |
| partitioned by |
| source + sync_date |
+----------+----------+
|
v
+----------+----------+
| dbt (staging) |
| Rename, cast, |
| deduplicate, |
| PII mask |
+----------+----------+
|
v
+----------+----------+
| dbt (marts) |
| Unified customer |
| entity, revenue |
| metrics, support |
| metrics |
+----------+----------+
|
+----------+----------+
| Warehouse |
| (Snowflake / |
| BigQuery) |
+----------+----------+
|
+-----+----+----+-----+
| | | |
v v v v
Looker Notebooks Alerts Reverse
ETLEach SaaS source has a connector (Fivetran, Airbyte, or a custom Python script for sources without off-the-shelf connectors). Connectors sync incrementally where the API supports cursor-based pagination (Salesforce, Stripe) and do full syncs where it does not (some legacy SFTP sources). Each connector writes raw JSON or CSV to S3, partitioned by source name and sync date.
The dbt staging layer normalizes column names (e.g., AccountId from Salesforce becomes account_id), casts data types, deduplicates records (using the source system's primary key + a sync timestamp tiebreaker), and applies PII masking rules (hashing email addresses, redacting phone numbers).
The dbt marts layer creates unified entities. The most important is the unified customer entity, which joins the Salesforce account ID, the Stripe customer ID, and the Zendesk requester email into a single customer_id using a mapping table. This entity resolution is the hardest engineering problem in this design and should receive the most discussion time. For more on transformation patterns, see our ETL vs ELT guide.
Build vs buy for connectors. Fivetran/Airbyte handle 80% of sources out of the box but cost $1-3 per connector per month at scale (50 connectors = $50-150K/year). Custom connectors are cheaper but require engineering maintenance. The hybrid approach (managed connectors for complex APIs like Salesforce, custom scripts for simple SFTP/CSV sources) is usually optimal. State this trade-off and let the interviewer push back.
+------------+ +----------+ +-----------------+
| Client SDK | | | | |
| (assigns |--->| Kafka |--->| S3 Raw Lake |
| variant, | | (events) | | /experiment_id/ |
| emits | | | | /dt=YYYY-MM-DD/ |
| events) | +----+-----+ +--------+--------+
+------------+ | |
| v
| +---------+---------+
| | Spark (daily) |
| | - join events to |
| | assignments |
| | - aggregate per |
| | experiment + |
| | variant + metric|
| +---------+---------+
| |
| v
| +---------+---------+
| | Stats Engine |
| | - sample sizes |
| | - confidence |
| | intervals |
| | - p-values |
| | - sequential |
| | testing bounds |
| +---------+---------+
| |
v v
+-----+-----+ +---------+---------+
| Assignment | | Experiment |
| Log Table | | Results Table |
| (user_id, | | (experiment_id, |
| exp_id, | | variant, |
| variant, | | metric, |
| ts) | | sample_size, |
+------------+ | mean, ci_lower, |
| ci_upper, pval) |
+----------+--------+
|
v
+----------+--------+
| Experiment |
| Dashboard |
+-------------------+The client SDK assigns users to experiment variants using a deterministic hash of user ID + experiment ID. This ensures consistent assignment across sessions without a server round-trip. The SDK emits events (page views, clicks, conversions) to Kafka, each tagged with the experiment ID and assigned variant.
The daily Spark job reads events from S3, joins them with the assignment log (to handle cases where events arrive without variant tags), and computes per-experiment, per-variant, per-metric aggregates: count, sum, mean, and variance. These aggregates feed into the statistics engine, which computes confidence intervals, p-values, and minimum detectable effect sizes.
To prevent peeking bias (experiment owners checking results too early and calling winners prematurely), the platform uses sequential testing bounds. The dashboard displays results but disables the "declare winner" button until the pre-registered sample size is reached or the sequential test crosses its boundary. This is a critical design decision that interviewers look for.
Pre-computed vs on-demand results. Pre-computing experiment results daily (materialize all metrics for all experiments into a results table) means the dashboard loads instantly but results are up to 24 hours stale. On-demand computation (query raw events at dashboard load time) gives fresh results but takes 30-60 seconds per experiment with 500M events/day. The hybrid approach pre-computes for active experiments and archives completed experiments. State this trade-off and note that most production A/B testing platforms (Statsig, Optimizely, Eppo) use pre-computation.
These mistakes come up repeatedly across hundreds of system design interviews. Each one is a pattern that interviewers notice immediately and that costs candidates points. Study them before your next round.
Jumping to architecture without understanding the problem. The interviewer intentionally leaves the prompt vague. If you do not ask about data volume, latency requirements, and consumer needs, you are designing blind. Spend 3-5 minutes on this. It is never wasted time.
"We will use Kafka, Spark, and Snowflake" is not an architecture. It is a shopping list. Instead: "We need a durable message queue because the source emits 50K events/sec and the consumer processes at 10K/sec, so we need buffering with replay. Kafka fits because..." The requirement justifies the tool, not the other way around.
The happy path is 30% of the evaluation. The interviewer cares more about what happens when Kafka is down, when a Spark job fails at hour 3 of a 4-hour run, when data arrives 6 hours late. If you only discuss the happy path, you cap your rating at "meets expectations." Proactively raise failure modes before the interviewer asks.
Running batch and streaming in parallel for the same data doubles the complexity, doubles the cost, and creates a consistency problem between the two paths. Unless the business genuinely requires both sub-second and daily-aggregated views of the same data, pick one. Mention the Lambda trade-off and explain why you are choosing not to use it. That shows more maturity than defaulting to it. See our batch vs streaming guide for the full analysis.
Talking without drawing forces the interviewer to build the architecture in their head. Draw boxes. Draw arrows. Label them. A visual reference keeps the conversation grounded and prevents misunderstandings. Even in a virtual interview, use the shared whiteboard or screen-share a drawing tool.
Spending 15 minutes on Kafka partition strategy while never mentioning the storage or serving layer is a signal that you cannot scope a system. Cover the full architecture at medium depth first. Then ask the interviewer which area to drill into. They will tell you. This keeps you from over-investing in one area that the interviewer may not even be evaluating.
An architecture that processes 1M events/day with a $50K/month cloud bill when a $2K/month solution exists is not a good design. Senior-level candidates discuss cost alongside performance. Mention spot instances for batch workloads, reserved capacity for streaming, and storage tiering for cold data. The interviewer wants to see that you can build something the company can afford to operate.
In distributed systems, exactly-once delivery is extremely difficult to achieve end-to-end. If you claim it, the interviewer will drill into how you guarantee it across every component boundary. The practical pattern is at-least-once delivery with idempotent consumers. Deduplication keys (event ID + timestamp) at the consumer level give you effectively-once semantics without the complexity of distributed transactions.
Every design choice excludes alternatives. If you choose batch processing, state what you give up (freshness) and what you gain (simplicity, cost). If you choose streaming, state what you give up (cost, complexity) and what you gain (low latency). The interviewer is not looking for the "right" answer. They are looking for structured reasoning about trade-offs. Candidates who state trade-offs unprompted consistently score higher.
A pipeline that moves data without validating it is a liability. Mention schema validation at ingestion, null checks and range checks in the transformation layer, row count reconciliation between source and destination, and data freshness monitoring. These are operational concerns that separate production-ready designs from whiteboard exercises. For more on pipeline reliability patterns, see our pipeline interview questions guide.
System design interviews test fundamentals. Practice SQL, Python, and data engineering concepts so your design answers are grounded in real implementation experience. Strong data modeling skills make every system design answer more credible.
Lambda, Kappa, event-driven, and request-driven architectures
MERGE, partition overwrite, and tombstone patterns for safe reruns
Transformation timing decisions and when to use each
When to use each with cost and complexity trade-offs
Star schemas, normalization, slowly changing dimensions, and warehouse modeling patterns
Storage architecture choices that shape system design decisions
50+ SQL questions with worked solutions for DE interviews
Complete preparation guide covering all interview rounds
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.