The Take-Home Assignment
The Hidden Rubric Graders Use
Reverse-engineered from 47 graded take-homes across 19 companies. Most companies grade on a 1 to 5 scale per dimension, with 4+ on every dimension required for a hire signal.
| Dimension | Weight | What 5/5 Looks Like |
|---|---|---|
| Correctness | 30% | Output matches expected results across all sample inputs, including edge cases not in the spec (empty input, malformed rows, duplicates). |
| Code quality | 20% | Functions under 30 lines, clear naming, type hints, no dead code, no commented-out code, idiomatic in the chosen language. |
| Repo structure | 15% | Logical module split (ingest, transform, output), requirements.txt or pyproject.toml, .gitignore, README with run instructions. |
| Written explanation | 20% | README explains your approach, the trade-offs you made, what you would do with more time, and how you would deploy this in production. |
| Differentiator | 15% | One thing that goes beyond the spec: unit tests, a Makefile, a Docker image, a sample dashboard, performance benchmarks, an ADR document. |
The Repo Structure That Wins
Every grader opens the README first, then the directory tree. If those two artifacts are confused, the rest of the submission is read with skepticism. The structure below is what we have seen score 5/5 on repo structure across 12 different graders.
take-home-yourname/
├── README.md # 5-min walkthrough, runs in <60 sec
├── Makefile # make install, make run, make test
├── pyproject.toml # pinned versions, no requirements.txt
├── .gitignore # __pycache__, .venv, data/
├── data/
│ ├── input/ # sample inputs from the prompt
│ └── output/ # expected outputs (gitignored)
├── src/
│ └── pipeline/
│ ├── __init__.py
│ ├── ingest.py # source-to-raw
│ ├── transform.py # raw-to-clean
│ ├── aggregate.py # clean-to-mart
│ └── cli.py # entrypoint, click or argparse
├── tests/
│ ├── conftest.py
│ ├── test_ingest.py
│ ├── test_transform.py
│ └── fixtures/
│ └── sample_events.json
└── docs/
├── design.md # the architecture I would build
└── adr-001-pandas.md # why I chose pandas over SparkThe README Pattern That Wins
Five sections, in this order, no more. Every grader expects them. Skipping any one is a 1-point penalty on Written Explanation.
- 01
Quickstart (under 60 seconds to run)
make install && make run. If the grader cannot run your code in 60 seconds, you have already lost a point. Pin every dependency. Assume Python 3.11 only. Document the exact command that produces the output. - 02
What I built
Three sentences. The data flow: source -> what -> sink. The grader knows the spec; this section confirms you understood it. Don't repeat the prompt. - 03
Trade-offs
5 to 7 bullet points. 'I chose pandas over Spark because the dataset is small enough to fit in memory.' 'I deduplicated by event_id, not by composite key, because the spec said event_id is unique.' Each bullet is a decision you owned. - 04
What I would do with more time
5 bullets. Specific. 'Add CDC ingestion via Debezium for real-time updates.' 'Replace the in-memory sort with an external merge sort for inputs over 100GB.' 'Add data quality checks via Great Expectations.' This is where graders look for senior signal. - 05
How I would productionize this
Half a page. Where does this run (Airflow DAG, Kubernetes CronJob, AWS Glue job)? How does it get triggered? Where do logs go? What is the SLA? What gets paged when it breaks? Most candidates skip this section. Including it is the single biggest differentiator we have measured.
Five Patterns That Get You Rejected
- 01
Single 600-line script
Your code in one file labeled solution.py is the most common rejection signal. Even a small assignment should split into ingest, transform, output, and CLI modules. The split shows you think in pipelines, not in scripts. - 02
No tests
At least three unit tests covering happy path, an edge case, and an error case. Take-homes without tests cap your score at the equivalent of L3, regardless of code quality. Tests are the cheapest +1 point you can earn. - 03
No README, or a README that just says 'run main.py'
The README is graded equal to the code. A bare README signals you do not write for other engineers. The five-section pattern above is the minimum. - 04
Spending 20+ hours when the spec said 4
Graders compare submissions for proportionality. A 20-hour over-built submission to a 4-hour spec signals you cannot scope. Worse, it makes the grader feel guilty about taking your time, which biases against hire. - 05
Not handling edge cases the spec did not name
Empty input, malformed rows, duplicate keys, all-NULL columns. The spec will not list these. Graders check if you thought of them. Add defensive handling and document it in the README under Trade-offs.
How the Take-Home Connects to the Rest of the Loop
The take-home is where how to pass the SQL round meets how to pass the Python round in a single deliverable. It often replaces or augments the technical phone screen, and it always informs the onsite how to pass the system design round because the interviewer will ask "how would you scale this to 100x?". The patterns from how to pass the data modeling round show up directly in how you structure your output tables.
Companies with take-home-heavy loops: Airbnb's data engineer take-home is a famously rigorous 8 hours, Stripe sometimes uses a take-home for senior roles. If you're targeting any of these, see real Data Engineer take-home assignment examples for annotated walkthroughs.
Data engineer interview prep FAQ
How long should I actually spend on a 4-hour take-home?+
Should I use pandas, Spark, or vanilla Python?+
Should I add Docker?+
Do I need to write tests?+
How do I handle a take-home where the spec is intentionally ambiguous?+
Should I deploy the assignment, or is local-only OK?+
What if the take-home asks me to use a tool I haven't used before?+
Can I use AI tools to help with the take-home?+
See Annotated Take-Home Examples
Real take-home prompts from Stripe, Airbnb, Databricks, and more, with example solutions and graded rubric breakdowns.
Adjacent Data Engineer Interview Prep Reading
Annotated walkthroughs of take-homes from Stripe, Airbnb, Databricks.
Practice the patterns that show up inside take-homes.
Pillar guide covering every round in the Data Engineer loop, end to end.
More data engineer interview prep guides
Window functions, gap-and-island, and the patterns interviewers test in 95% of Data Engineer loops.
JSON flattening, sessionization, and vanilla-Python data wrangling in the Data Engineer coding round.
Star schema, SCD Type 2, fact-table grain, and how to defend a model against pushback.
Pipeline architecture, exactly-once semantics, and the framing that gets you to L5.
STAR-D answers tailored to data engineering, with example responses for impact and conflict.
How to think out loud, handle silence, and avoid the traps that sink fluent coders.