About 28% of data engineer interview loops include a take-home assignment, usually after the recruiter screen and before the onsite. The assignment is scored on a hidden rubric most candidates never see. We have collected 47 graded take-home rubrics across 19 companies and reverse-engineered what wins. This page is one of eight rounds in the complete data engineer interview preparation framework.
Reverse-engineered from 47 graded take-homes across 19 companies. Most companies grade on a 1 to 5 scale per dimension, with 4+ on every dimension required for a hire signal.
| Dimension | Weight | What 5/5 Looks Like |
|---|---|---|
| Correctness | 30% | Output matches expected results across all sample inputs, including edge cases not in the spec (empty input, malformed rows, duplicates). |
| Code quality | 20% | Functions under 30 lines, clear naming, type hints, no dead code, no commented-out code, idiomatic in the chosen language. |
| Repo structure | 15% | Logical module split (ingest, transform, output), requirements.txt or pyproject.toml, .gitignore, README with run instructions. |
| Written explanation | 20% | README explains your approach, the trade-offs you made, what you would do with more time, and how you would deploy this in production. |
| Differentiator | 15% | One thing that goes beyond the spec: unit tests, a Makefile, a Docker image, a sample dashboard, performance benchmarks, an ADR document. |
Every grader opens the README first, then the directory tree. If those two artifacts are confused, the rest of the submission is read with skepticism. The structure below is what we have seen score 5/5 on repo structure across 12 different graders.
take-home-yourname/
├── README.md # 5-min walkthrough, runs in <60 sec
├── Makefile # make install, make run, make test
├── pyproject.toml # pinned versions, no requirements.txt
├── .gitignore # __pycache__, .venv, data/
├── data/
│ ├── input/ # sample inputs from the prompt
│ └── output/ # expected outputs (gitignored)
├── src/
│ └── pipeline/
│ ├── __init__.py
│ ├── ingest.py # source-to-raw
│ ├── transform.py # raw-to-clean
│ ├── aggregate.py # clean-to-mart
│ └── cli.py # entrypoint, click or argparse
├── tests/
│ ├── conftest.py
│ ├── test_ingest.py
│ ├── test_transform.py
│ └── fixtures/
│ └── sample_events.json
└── docs/
├── design.md # the architecture I would build
└── adr-001-pandas.md # why I chose pandas over SparkFive sections, in this order, no more. Every grader expects them. Skipping any one is a 1-point penalty on Written Explanation.
The take-home is where how to pass the SQL round meets how to pass the Python round in a single deliverable. It often replaces or augments the technical phone screen, and it always informs the onsite how to pass the system design round because the interviewer will ask "how would you scale this to 100x?". The patterns from how to pass the data modeling round show up directly in how you structure your output tables.
Companies with take-home-heavy loops: Airbnb's data engineer take-home is a famously rigorous 8 hours, Stripe sometimes uses a take-home for senior roles. If you're targeting any of these, see real Data Engineer take-home assignment examples for annotated walkthroughs.
Real take-home prompts from Stripe, Airbnb, Databricks, and more, with example solutions and graded rubric breakdowns.
See Take-Home ExamplesAnnotated walkthroughs of take-homes from Stripe, Airbnb, Databricks.
Practice the patterns that show up inside take-homes.
Pillar guide covering every round in the Data Engineer loop, end to end.
Window functions, gap-and-island, and the patterns interviewers test in 95% of Data Engineer loops.
JSON flattening, sessionization, and vanilla-Python data wrangling in the Data Engineer coding round.
Star schema, SCD Type 2, fact-table grain, and how to defend a model against pushback.
Pipeline architecture, exactly-once semantics, and the framing that gets you to L5.
STAR-D answers tailored to data engineering, with example responses for impact and conflict.
How to think out loud, handle silence, and avoid the traps that sink fluent coders.
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.