Learn SQL, Python, and Data Modeling Interactively

Data Engineering Lessons

136+ interactive data engineering lessons with real code execution. Learn SQL queries, Python for data engineering, and data modeling through hands-on practice. Every lesson includes challenges you solve by writing and running real code against live databases.

Data Modeling Lessons (11)

  • Keys & Identity - 22 min

    Every record deserves a fingerprint

    Topics: The Problem of Identity, Primary Keys: Data Identity, Foreign Keys, Composite Keys, Key Generation Strategies

  • Schema Types - 22 min

    Choosing the right box for every value

    Topics: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework

  • Relationships - 18 min

    How tables talk to each other

    Topics: What Are Relationships?, Cardinality Explained, Required vs Optional, Self-Referential Tables, Complex Patterns

  • Normalization - 15 min

    Why copying data breaks everything

    Topics: Data Gets Out of Sync, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Identifying Normal Form

  • Beyond 3NF - 19 min

    Beyond third normal form

    Topics: Boyce-Codd Normal Form, Fourth Normal Form (4NF), Fifth Normal Form (5NF), Strategic Denormalization, Denormalization Patterns

  • Star Schemas - 30 min

    Stars, snowflakes, and facts between

    Topics: The Star Schema, Types of Fact Tables, Types of Dimensions, Defining the Grain, Surrogate Keys

  • Nested Data - 15 min

    When flat tables meet nested reality

    Topics: The Nesting Decision, STRUCT: Embedded Objects, ARRAY: Ordered Collections, MAP: Dynamic Key-Value Pairs, Columnar Storage & Nesting

  • Event Streams - 27 min

    Data that never forgets

    Topics: Event-Driven Architecture, Immutable Append-Only Logs, Event Sourcing, Clickstream Modeling, Handling Late-Arriving Data

  • Pre-Aggregation - 23 min

    Pre-computing answers before anyone asks

    Topics: Why Pre-Aggregate?, Metric Types and Additivity, OLAP Cubes & Rollups, Granularity Design, Refresh Strategies & Materialized Views

  • Design Patterns - 21 min

    Blueprints for building data systems

    Topics: Medallion Architecture, Data Vault, One Big Table (OBT), Semantic Layers, Pipeline DAG Design

Pipeline Architecture Lessons (41)

  • How Data Moves: Beginner - beginner - 20 min

    Nail the batch vs streaming question and defend your choice

    Topics: Batch Processing, Stream Processing, File Ingestion, API Ingestion, Batch vs Streaming

  • How Data Moves: Intermediate - intermediate - 25 min

    Survive the follow-up probes on batch, streaming, and hybrid

    Topics: Batch Mechanics, Stream Guarantees, File Format Depth, API Patterns, Hybrid Architectures

  • How Data Moves: Advanced - advanced - 30 min

    Handle the depth probes: idempotency, backpressure, and cost

    Topics: Idempotent Pipelines, Backpressure, Late-Arriving Data, Dead Letter Queues, Cost of Freshness

  • Where Data Lives: Beginner - beginner - 20 min

    Answer the storage questions: Parquet, partitioning, lake vs warehouse

    Topics: Columnar vs Row, Compression, Partitioning, Lake vs Warehouse, Table Formats

  • Where Data Lives: Intermediate - intermediate - 25 min

    Survive the storage follow-ups: encoding, small files, schema evolution

    Topics: Encoding Types, The Small File Problem, Predicate Pushdown, Storage Tiering, Schema Evolution

  • Keeping Data Fresh: Beginner - beginner - 20 min

    Answer the incremental loading question that follows every pipeline design

    Topics: Full vs Incremental Loading, Change Data Capture, Slowly Changing Dimensions, Schema Evolution, Backfilling

  • Keeping Data Fresh: Intermediate - intermediate - 25 min

    Master the incremental loading patterns that interviewers probe hardest

    Topics: Merge Strategies, CDC Patterns, SCD in Pipelines, Schema Migration, Partition-Level Backfill

  • Distributed Compute: Beginner - beginner - 20 min

    Answer the Spark architecture question that appears in every technical screen

    Topics: Spark Execution Model, Distributed Primitives, Shuffle Operations, Memory Management, Small File Problem

  • Streaming Systems: Beginner - beginner - 20 min

    Answer the Kafka and streaming questions with confidence

    Topics: Event Platforms, Event-Driven Architecture, Late-Arriving Data, Dead Letter Queues, Micro-Batch vs True Streaming

  • Streaming Systems: Intermediate - intermediate - 25 min

    Master offset management, consumer groups, and streaming failure modes

    Topics: Consumer Groups and Offsets, Event Sourcing Patterns, Windowing and Watermarks, DLQ Patterns, Spark Streaming vs Flink

Python Lessons (42)

  • Python Foundations: Beginner - beginner - 18 min

    Your first lines of Python start here

    Topics: Variables and Assignment, Data Types, Print Statements, Basic Operators, Comments

  • Python Foundations: Intermediate - intermediate - 20 min

    Decisions, loops, and reusable logic

    Topics: Conditional Statements, Loops, Functions, Return Values, Variable Scope

  • Python Foundations: Advanced - advanced - 19 min

    Lambdas, comprehensions, and more

    Topics: Lambda Functions, List Comprehensions, Decorators, Generators, Context Managers

  • Python Expressions: Beginner - beginner - 27 min

    Where every Python journey begins

    Topics: How Computers Store Data, Variables and Naming, Assignment vs. Equality, Data Types and Strings, Operators and Readability

  • Python Expressions: Intermediate - intermediate - 38 min

    Making decisions with data

    Topics: Type Conversions, Comparison Operators, Logical Operators, Multiple Assignment, None and Identity

  • Python Expressions: Advanced - advanced - 40 min

    Patterns for technical interviews

    Topics: Multiple Assignment, Short-circuit Evaluation, Truthy and Falsy Values, Ternary Expressions, Walrus Operator (:=)

  • Control Flow: Beginner - beginner - 37 min

    Making decisions in code

    Topics: The if Statement, Branching with if-else, Chaining if-elif-else, Combining with and/or, Execution Flow

  • Control Flow: Intermediate - intermediate - 33 min

    Writing cleaner conditional logic

    Topics: Guard Clauses, Chained Comparisons, Pattern Matching with match-case, Conditional Assignment, Edge Case Handling

  • Control Flow: Advanced - advanced - 28 min

    Elegant patterns for complex decisions

    Topics: Boolean Simplification, De Morgan's Laws, State Machine Patterns, Dict-Based Dispatch, Decision Table Lookups

  • Loops: Beginner - beginner - 39 min

    Repeating actions efficiently

    Topics: Iterating with for Loops, range() Function, Loops with while, Using break and continue, Loop Variable Scope

Spark Lessons (12)

  • How a Spark Job Runs - beginner - 12 min

    Your query is a promise. Something has to keep it.

    Topics: The Cluster: Who Plans, Who Works, Partitions: The Unit of Parallelism, Transformations vs Actions, Cores and Slots, A Job's Life, End to End

  • How a Spark Job Runs: Stages and Plans - intermediate - 12 min

    The boundaries between stages are where the cost lives.

    Topics: Job, Stage, Task, Why Stages Exist At All, Reading Parallelism, Where the Driver Lives, spark-submit and the Config Surface

  • How a Spark Job Runs: Scheduler Internals - advanced - 14 min

    The failure edges separate tuning from understanding.

    Topics: DAGScheduler vs TaskScheduler, Task Failure and Retry, Speculative Execution, The Driver as Bottleneck, Locality and Scheduling Delay

  • Lazy Until You Ask - beginner - 13 min

    You wrote a recipe. Nothing cooks until you call an action.

    Topics: Nothing Runs Until an Action, Why Laziness Makes Spark Fast, The Action Catalog: What Actually Triggers a Run, The collect() Trap, Re-Execution: The Chain Runs Again Every Time

  • Reading the Plan: DAG, Stages, and explain() - intermediate - 14 min

    The shape of the graph is the map of where your time goes.

    Topics: The DAG: Your Plan as a Graph, Counting Stages Is Counting Shuffles, Logical vs Physical Plan: Reading explain(), Pipelining: Why Narrow Ops Are Nearly Free, DAG vs Lineage: The Plan and the Recovery History

  • Lineage as Fault Tolerance - advanced - 15 min

    A partition is never data Spark trusts to survive. It is a recipe Spark can rebuild.

    Topics: Lineage-Based Recovery: Rebuild, Don't Re-Read, The Recompute Cost: When Lineage Gets Expensive, Checkpointing: Cutting the Lineage, Cache vs Checkpoint vs Persist: Which Solves What, Determinism: Why Recompute-Based Recovery Can Break

  • Narrow, Wide, and the Shuffle - beginner - 13 min

    One category is free. The other can run your whole bill.

    Topics: Narrow Transformations: Each Piece Stays Home, Wide Transformations: When Rows Must Come Together, What 'Shuffle' Actually Means, Why Wide Is Expensive and Narrow Is Nearly Free, Spotting the Shuffle in Your Own Code

  • Inside the Shuffle - intermediate - 14 min

    Two halves, a write and a read, with disk and the network in between.

    Topics: The Shuffle Write: Staging Data by Key, The Shuffle Read: Fetching Across the Network, Spill: When the Shuffle Runs Out of Memory, The 200 Knob: spark.sql.shuffle.partitions, Why the Shuffle Dominates Runtime

  • Shuffle Internals and Elimination - advanced - 15 min

    The cheapest shuffle is the one you engineered away.

    Topics: Sort-Based Shuffle: One File, Not N Squared, The External Shuffle Service: Surviving a Dead Executor, Pricing a Shuffle: Bytes Moved to Wall-Clock, Eliminating a Shuffle, The Shuffle Tuning Knobs

  • The Optimizer Works For You - beginner - 13 min

    You stopped telling Spark how, and started telling it what.

    Topics: Declare What, Not How: Why DataFrames Beat RDDs, The Optimizer Exists: Your Query Is Rewritten, The RDD Escape Hatch and Its Cost, DataFrame, Dataset, RDD: Three APIs, Three Trade-offs, Seeing the Optimization Happen with explain()

SQL Lessons (30)

  • Query Structure: Beginner - beginner - 9 min

    Your first SQL query — demystified

    Topics: Tables, rows, and columns, SELECT and FROM basics, Selecting all columns (*), AS aliases for columns, Expressions in SELECT

  • Query Structure: Intermediate - intermediate - 28 min

    CTEs: subqueries are a cry for help

    Topics: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT

  • Query Structure: Advanced - advanced - 31 min

    SQL operators nobody warned you about

    Topics: Correlated subqueries, EXCEPT and EXCEPT ALL, INTERSECT and INTERSECT ALL, UNNEST for arrays, SELECT Without FROM

  • Data Types: Beginner - beginner - 19 min

    INT, VARCHAR, and the lies we tell

    Topics: Why data types matter, INTEGER for whole numbers, STRING types (VARCHAR), BOOLEAN for true/false, CAST for type conversion

  • Data Types: Intermediate - intermediate - 19 min

    Where pennies vanish and NULLs defy

    Topics: BOOLEAN and NULL logic, DECIMAL precision and scale, TIMESTAMP vs TIMESTAMP WITH TIME ZONE, Time zones and UTC handling, TRY_CAST and implicit casts

  • Data Types: Advanced - advanced - 21 min

    Arrays, maps, and type optimization at scale

    Topics: Type optimization at scale, Storage calculations and VARCHAR, MAP data type for key-value pairs, Accessing nested data (UNNEST), Compression and error handling

  • Filtering: Beginner - beginner - 35 min

    WHERE: your database bouncer

    Topics: WHERE clause for filtering rows, Equals and not equals (=, !=), Comparison operators (<, >), IN and NOT IN for list matching, AND for combining conditions

  • Filtering: Intermediate - intermediate - 24 min

    Boolean logic: it's complicated

    Topics: OR and CASE expressions, Operator precedence, LIKE for pattern matching, LIMIT and OFFSET for pagination, BETWEEN for range filtering

  • Filtering: Advanced - advanced - 19 min

    Subqueries, regex, and other crimes

    Topics: Correlated subqueries (EXISTS), NOT EXISTS for missing rows, NOT IN vs NULL gotchas, REGEXP_LIKE patterns, Regex operators and patterns

  • Aggregating: Beginner - beginner - 38 min

    A million rows walk into a SUM...

    Topics: GROUP BY for categorizing data, COUNT variations (*,col,DISTINCT), SUM and AVG calculations, MIN and MAX for extremes, HAVING for filtered groups

Related Resources