The Whiteboard Design Round
The Visual Language Conventions
Interviewers do not grade the prettiness of your diagram, but they do grade legibility and consistency. Use the conventions below and your diagram will read like an architecture, not a sketch.
| Element | Shape | Common Examples |
|---|---|---|
| Stateless compute | Rectangle | API service, Spark job, Lambda |
| Stateful compute | Rectangle with thicker border | Flink job, stream processor |
| Object storage | Cylinder | S3, GCS, Azure Blob |
| Database / warehouse | Cylinder with horizontal lines | Postgres, Snowflake, BigQuery, Redshift |
| Message queue / stream | Long thin rectangle | Kafka topic, Kinesis stream, SQS queue |
| Cache | Small cylinder | Redis, Memcached |
| External system | Cloud shape or dashed rectangle | Stripe API, third-party data feed |
| Data flow | Solid arrow with format label | JSON, Parquet, Avro, Protobuf |
| Control flow | Dashed arrow | Trigger, schedule, callback |
| Failure path | Red or labeled arrow | DLQ, retry, alert |
The 7-Component Template
Every data architecture has 7 component slots. Not every problem needs all 7, but listing them helps you remember what to consider before drawing.
- 01
Source
Where does the data come from? Production OLTP, third-party API, application event firehose, IoT device, manual upload. State the volume and the format. - 02
Ingest layer
How does data enter the platform? Kafka, Kinesis, Debezium CDC, REST POST endpoint, scheduled S3 sync. State the consistency guarantee (at-least-once vs exactly-once). - 03
Raw / bronze storage
Immutable raw landing zone. Almost always object storage (S3) with date partitioning. Format: typically the source format (JSON, Avro) for fidelity. - 04
Transformation layer
Spark, Flink, dbt, Snowpark, BigQuery scheduled queries, Glue. Where raw becomes clean. State the trigger (cron, event-driven, streaming). - 05
Curated / silver / gold storage
Where the data lives for consumption. Snowflake, BigQuery, Redshift, Iceberg or Delta on S3. Modeled as star schema, conformed dimensions, or wide tables depending on workload. - 06
Serving layer
How consumers read data. BI dashboard via Looker or Tableau, ML model serving, real-time API backed by Redis, semantic layer like dbt Semantic Layer or Cube. - 07
Monitoring and orchestration
Airflow / Dagster / Prefect for scheduling. Datadog or Grafana for observability. Great Expectations or Monte Carlo for data quality. PagerDuty for alerts. Often drawn as a sidebar around the main flow.
How to Handle the Marker
The physical mechanics matter. Candidates who fumble with the marker, write upside down, or run out of space halfway through lose points even when their architecture is right. Three tactics that prevent this:
Plan the layout first. Spend 30 seconds touching the corners of the board with the marker cap, deciding where the source goes (top-left), where the consumer goes (bottom-right), and roughly where the 5 to 7 components in the middle will sit. This prevents the “ran out of space” failure mode.
Write component names first, draw boxes around them after. This forces you to use only as much space as the name needs, which prevents the “huge box, tiny label” mess that hard-to-read whiteboards become.
Color code if multiple markers are available. Black for the main flow, blue for monitoring or control flow, red for failure paths. Even on a virtual whiteboard, a 3-color palette dramatically improves legibility.
Worked Whiteboard Walkthrough
How a real candidate drew an attribution pipeline at a major ad tech company in 2025. The sequence below is what got them the L5 offer.
Clarify (45 seconds, no marker yet)
Layout (30 seconds, light marker)
Draw the real-time path (3 minutes)
Draw the batch path (3 minutes)
Narrate the data flow (10 minutes)
Failure modes (15 minutes)
Operational details (10 minutes)
What Interviewers Watch For
- 01
Diagram clarity
Can the interviewer take a photo of the whiteboard and have it make sense without you in the room? If not, the diagram is failing its job. - 02
Pacing
20 minutes drawing, 40 minutes narrating. Spending 50 minutes drawing means you ran out of failure-mode time, which is the L5 signal slot. - 03
Annotated arrows
Every arrow has a data format and an approximate volume. “JSON, 200K/sec” on an arrow tells the interviewer 10 things at once. - 04
Conventions held throughout
If you use cylinders for storage early, use them for storage everywhere. Inconsistent symbols make the interviewer mentally retranslate every component, which slows their grading and lowers their confidence.
How Whiteboard Design Connects to the Rest of the Loop
Whiteboard design is the format under which data pipeline system design interview prep is conducted in person. The reasoning is identical; the medium is the constraint. The schema sketches you draw here borrow from schema design interview walkthrough, and the cloud-service references are deeper if you've prepped AWS Data Engineer interview prep or Google Cloud Data Engineer interview prep.
Companies most likely to use a physical whiteboard: Netflix's onsite design rounds are still whiteboard-first, Airbnb uses Excalidraw for virtual loops, and most FAANG onsites at L5+. If you're prepping for L6 / staff Data Engineer interview prep, expect at least one whiteboard round.
Data engineer interview prep FAQ
What if the round is virtual? Do whiteboard rules still apply?+
Should I bring my own marker to a physical whiteboard interview?+
What if I run out of space on the whiteboard?+
How detailed should component labels be?+
Should I draw a database schema in this round?+
What if the interviewer asks me to redraw a component?+
Are virtual whiteboard rounds easier or harder than physical?+
Do I need to know the exact AWS / GCP / Azure service names?+
Practice Design Rounds With a Live Sandbox
Run system design mock interviews against real prompts. Get feedback on your framework, your trade-offs, and your failure-mode reasoning.
Adjacent Data Engineer Interview Prep Reading
30+ design problems with worked architectures and trade-off analysis.
Architecture-level questions with reference designs.
Pillar guide covering every round in the Data Engineer loop, end to end.
More data engineer interview prep guides
Window functions, gap-and-island, and the patterns interviewers test in 95% of Data Engineer loops.
JSON flattening, sessionization, and vanilla-Python data wrangling in the Data Engineer coding round.
Star schema, SCD Type 2, fact-table grain, and how to defend a model against pushback.
Pipeline architecture, exactly-once semantics, and the framing that gets you to L5.
STAR-D answers tailored to data engineering, with example responses for impact and conflict.
What graders look for in a 4 to 8 hour Data Engineer take-home, with a rubric breakdown.