Data Engineering Tools Hub

Tutorials, interview questions, and runnable practice for the tools that come up in real data engineering interviews. Everything is free and framed around what interviewers actually test.

Apache Spark and PySpark

Transformation (dbt)

Orchestration (Airflow)

Streaming (Kafka)

Warehouse and Lakehouse

02 / Why practice

Run a Real Interview

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition