Interview Round Guide

The Whiteboard Design Round

Whiteboard design is the in-person variant of the system design round, with one critical difference: you must draw clearly while you talk and think. About 14% of onsite Data Engineer loops still include a physical whiteboard component (down from 30% pre-2020 but stable since), and the format is back at FAANG companies for senior roles. Even virtual whiteboards (Excalidraw, Miro, Google Drawings) test the same skill. This page is one of eight rounds in the the full data engineer interview playbook.

The Short Answer
Expect a 60 minute round at a physical or virtual whiteboard. You will draw a data architecture for a real-world problem (clickstream, recommendation features, financial reconciliation). Use a consistent visual language: rectangles for compute, cylinders for storage, queues for streams, dashed lines for control flow, solid lines for data flow. Write component names inside the boxes and data formats on the arrows. Strong candidates draw 5 to 7 boxes, spend 20 minutes on the diagram, and 40 minutes narrating and handling failure modes.
Updated April 2026·By The DataDriven Team

The Visual Language Conventions

Interviewers do not grade the prettiness of your diagram, but they do grade legibility and consistency. Use the conventions below and your diagram will read like an architecture, not a sketch.

ElementShapeCommon Examples
Stateless computeRectangleAPI service, Spark job, Lambda
Stateful computeRectangle with thicker borderFlink job, stream processor
Object storageCylinderS3, GCS, Azure Blob
Database / warehouseCylinder with horizontal linesPostgres, Snowflake, BigQuery, Redshift
Message queue / streamLong thin rectangleKafka topic, Kinesis stream, SQS queue
CacheSmall cylinderRedis, Memcached
External systemCloud shape or dashed rectangleStripe API, third-party data feed
Data flowSolid arrow with format labelJSON, Parquet, Avro, Protobuf
Control flowDashed arrowTrigger, schedule, callback
Failure pathRed or labeled arrowDLQ, retry, alert

The 7-Component Template

Every data architecture has 7 component slots. Not every problem needs all 7, but listing them helps you remember what to consider before drawing.

1

Source

Where does the data come from? Production OLTP, third-party API, application event firehose, IoT device, manual upload. State the volume and the format.
2

Ingest layer

How does data enter the platform? Kafka, Kinesis, Debezium CDC, REST POST endpoint, scheduled S3 sync. State the consistency guarantee (at-least-once vs exactly-once).
3

Raw / bronze storage

Immutable raw landing zone. Almost always object storage (S3) with date partitioning. Format: typically the source format (JSON, Avro) for fidelity.
4

Transformation layer

Spark, Flink, dbt, Snowpark, BigQuery scheduled queries, Glue. Where raw becomes clean. State the trigger (cron, event-driven, streaming).
5

Curated / silver / gold storage

Where the data lives for consumption. Snowflake, BigQuery, Redshift, Iceberg or Delta on S3. Modeled as star schema, conformed dimensions, or wide tables depending on workload.
6

Serving layer

How consumers read data. BI dashboard via Looker or Tableau, ML model serving, real-time API backed by Redis, semantic layer like dbt Semantic Layer or Cube.
7

Monitoring and orchestration

Airflow / Dagster / Prefect for scheduling. Datadog or Grafana for observability. Great Expectations or Monte Carlo for data quality. PagerDuty for alerts. Often drawn as a sidebar around the main flow.

How to Handle the Marker

The physical mechanics matter. Candidates who fumble with the marker, write upside down, or run out of space halfway through lose points even when their architecture is right. Three tactics that prevent this:

Plan the layout first. Spend 30 seconds touching the corners of the board with the marker cap, deciding where the source goes (top-left), where the consumer goes (bottom-right), and roughly where the 5 to 7 components in the middle will sit. This prevents the “ran out of space” failure mode.

Write component names first, draw boxes around them after. This forces you to use only as much space as the name needs, which prevents the “huge box, tiny label” mess that hard-to-read whiteboards become.

Color code if multiple markers are available. Black for the main flow, blue for monitoring or control flow, red for failure paths. Even on a virtual whiteboard, a 3-color palette dramatically improves legibility.

Worked Whiteboard Walkthrough

How a real candidate drew an attribution pipeline at a major ad tech company in 2025. The sequence below is what got them the L5 offer.

Step 1

Clarify (45 seconds, no marker yet)

“So this is impression-to-conversion attribution, with a 7-day click window? Are we doing last-click, first-click, or multi-touch? And what's the freshness requirement: real-time, hourly, or daily?” The interviewer answers: last-click, 7-day window, daily for reporting + real-time for ML.
Step 2

Layout (30 seconds, light marker)

Touch the four corners. Source (top-left), warehouse (bottom-right), ML serving (right). Mark a horizontal line across the middle separating real-time from batch.
Step 3

Draw the real-time path (3 minutes)

Web events -> Kafka (impressions topic) -> Flink stateful job (keyed by user_id, 7-day state TTL) -> Redis (online attribution lookup, p99 < 10ms). Annotate Kafka partitioning, Flink state size estimate, Redis TTL.
Step 4

Draw the batch path (3 minutes)

Kafka -> S3 raw landing -> Spark daily job -> Snowflake fact_attribution. Annotate the join window (events from t-7 days to t-1 day, attributed to clicks in the same window). State the partition: event_date.
Step 5

Narrate the data flow (10 minutes)

Walk one impression through the system. State that the same user_id is the join key on both paths. State that the daily Spark job is the source of truth, the Flink job is approximate. State the delta-checking job that compares them daily and alerts if they diverge by more than 0.5%.
Step 6

Failure modes (15 minutes)

Failure 1: Flink TaskManager crash. Recovery: checkpoint replay, no data loss because Kafka retention covers the gap. Failure 2: Late-arriving conversion (8 days after click). Recovery: dropped from real-time, captured in daily Spark which has wider join window. Failure 3: Hot user_id (whale account driving 1% of all events). Recovery: salt the partition key with mod-N suffix, recombine in aggregation step.
Step 7

Operational details (10 minutes)

On-call runbook for Flink lag > 5 minutes. Daily reconciliation report for batch vs real-time delta. Schema evolution: producer adds a field, Flink reads with default, Spark catches up next day. Cost: estimate Snowflake credits for the daily job and Kinesis Data Streams cost for Kafka equivalent.

What Interviewers Watch For

1

Diagram clarity

Can the interviewer take a photo of the whiteboard and have it make sense without you in the room? If not, the diagram is failing its job.
2

Pacing

20 minutes drawing, 40 minutes narrating. Spending 50 minutes drawing means you ran out of failure-mode time, which is the L5 signal slot.
3

Annotated arrows

Every arrow has a data format and an approximate volume. “JSON, 200K/sec” on an arrow tells the interviewer 10 things at once.
4

Conventions held throughout

If you use cylinders for storage early, use them for storage everywhere. Inconsistent symbols make the interviewer mentally retranslate every component, which slows their grading and lowers their confidence.

How Whiteboard Design Connects to the Rest of the Loop

Whiteboard design is the format under which data pipeline system design interview prep is conducted in person. The reasoning is identical; the medium is the constraint. The schema sketches you draw here borrow from schema design interview walkthrough, and the cloud-service references are deeper if you've prepped AWS Data Engineer interview prep or Google Cloud Data Engineer interview prep.

Companies most likely to use a physical whiteboard: Netflix's onsite design rounds are still whiteboard-first, Airbnb uses Excalidraw for virtual loops, and most FAANG onsites at L5+. If you're prepping for L6 / staff Data Engineer interview prep, expect at least one whiteboard round.

Data Engineer Interview Prep FAQ

What if the round is virtual? Do whiteboard rules still apply?+
Yes, with one change: virtual whiteboards (Excalidraw, Miro, Google Drawings) reward keyboard shortcuts. Practice the tool the company uses before the round. The visual conventions and the 7-component template are identical; only the marker is replaced.
Should I bring my own marker to a physical whiteboard interview?+
Most companies provide markers. If you can, ask the recruiter ahead of time. Bringing a fine-tip marker is a positive signal because most office markers are dried out. Black, blue, red is the kit.
What if I run out of space on the whiteboard?+
Erase the parts of the diagram you've already discussed (e.g., the source description) to make room. Or move to a second whiteboard if available. Running out of space is a planning failure; mid-round recovery is graded on calmness.
How detailed should component labels be?+
Component name + technology in parentheses + one annotation. Example: 'Stream Processor (Flink, exactly-once, 24h state)'. This density is right. Just 'Flink' is too sparse. A full sentence inside the box is too dense.
Should I draw a database schema in this round?+
Only if asked. Whiteboard design rounds focus on architecture; if you slip into a CREATE TABLE drawing, the interviewer will redirect. Save schema drawings for the dedicated modeling round.
What if the interviewer asks me to redraw a component?+
Erase and redraw. Do not argue. Saying 'sure, let me try a different approach' shows responsiveness. Defending your first draw against feedback is a persistent downgrade signal.
Are virtual whiteboard rounds easier or harder than physical?+
Easier mechanically (no marker fumbling, easier erasing) but harder communicatively. On a physical whiteboard, your body language tells the interviewer where your attention is. On a virtual one, you must verbally signal what you're focusing on. Use the cursor as your pointing finger.
Do I need to know the exact AWS / GCP / Azure service names?+
Yes for the company's primary cloud, ideally for two clouds. Saying 'a managed Kafka equivalent' instead of 'MSK' or 'Confluent Cloud' is a junior signal. Cloud fluency at the service-name level is a senior signal.

Practice Design Rounds With a Live Sandbox

Run system design mock interviews against real prompts. Get feedback on your framework, your trade-offs, and your failure-mode reasoning.

Start the Design Mock Interview

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats