Learn SQL, Python, and Data Modeling Interactively

Data Engineering Lessons

113+ interactive data engineering lessons with real code execution. Learn SQL queries, Python for data engineering, and data modeling through hands-on practice. Every lesson includes challenges you solve by writing and running real code against live databases.

Data Modeling Lessons (11)

  • Keys & Identity - 22 min

    Every record deserves a fingerprint

    Topics: The Problem of Identity, Primary Keys: Data Identity, Foreign Keys, Composite Keys, Key Generation Strategies

  • Schema Types - 22 min

    Choosing the right box for every value

    Topics: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework

  • Relationships - 18 min

    How tables talk to each other

    Topics: What Are Relationships?, Cardinality Explained, Required vs Optional, Self-Referential Tables, Complex Patterns

  • Normalization - 15 min

    Why copying data breaks everything

    Topics: Data Gets Out of Sync, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Identifying Normal Form

  • Beyond 3NF - 19 min

    Beyond third normal form

    Topics: Boyce-Codd Normal Form, Fourth Normal Form (4NF), Fifth Normal Form (5NF), Strategic Denormalization, Denormalization Patterns

  • Star Schemas - 30 min

    Stars, snowflakes, and facts between

    Topics: The Star Schema, Types of Fact Tables, Types of Dimensions, Defining the Grain, Surrogate Keys

  • Nested Data - 15 min

    When flat tables meet nested reality

    Topics: The Nesting Decision, STRUCT: Embedded Objects, ARRAY: Ordered Collections, MAP: Dynamic Key-Value Pairs, Columnar Storage & Nesting

  • Event Streams - 27 min

    Data that never forgets

    Topics: Event-Driven Architecture, Immutable Append-Only Logs, Event Sourcing, Clickstream Modeling, Handling Late-Arriving Data

  • Pre-Aggregation - 23 min

    Pre-computing answers before anyone asks

    Topics: Why Pre-Aggregate?, Metric Types and Additivity, OLAP Cubes & Rollups, Granularity Design, Refresh Strategies & Materialized Views

  • Design Patterns - 21 min

    Blueprints for building data systems

    Topics: Medallion Architecture, Data Vault, One Big Table (OBT), Semantic Layers, Pipeline DAG Design

Pipeline Architecture Lessons (30)

  • What a Data Pipeline Is: Beginner - beginner - 25 min

    Data lives where it is created, not where it is needed; pipelines move and reshape it

    Topics: Why Pipelines Exist, The Four Roles in Any Pipeline, Reading a Pipeline Left to Right, A First End-to-End Pipeline, When a Pipeline Is Not Needed

  • What a Data Pipeline Is: Intermediate - intermediate - 30 min

    When one pipeline becomes many, the question is not what to build but how the pieces fit

    Topics: Many Sources, One Curated Layer, ETL vs ELT, The DAG: Why Dependencies Form, Reading a Real Pipeline Diagram, One Source, Two Different Consumers

  • What a Data Pipeline Is: Advanced - advanced - 35 min

    Pipelines are products with owners, contracts, and lifecycles, not scripts that move data

    Topics: Pipelines as Products, The Cross-Cutting Undercurrents, When to Split, When to Merge, Build vs Buy at Each Layer, Redesigning a Tangled Graph

  • Batch vs Streaming: Beginner - beginner - 25 min

    Data moves in scheduled chunks or in a continuous flow; the choice changes everything downstream

    Topics: Two Ways Data Can Move, Batch: Picture, Rhythm, Example, Streaming: Picture, Rhythm, Example, What Real-Time Actually Means, Picking Batch or Streaming

  • Batch vs Streaming: Intermediate - intermediate - 30 min

    Latency, throughput, state, and cost are the dimensions; pick deliberately, not by default

    Topics: Latency vs Throughput Tradeoff, Micro-Batch: The Middle Ground, Why Streaming Costs More, Stateful vs Stateless Transforms, When Batch Outgrows Itself

  • Batch vs Streaming: Advanced - advanced - 35 min

    Lambda, Kappa, and unified engines: architectures live or die on freshness tier discipline

    Topics: Lambda Architecture, Kappa: Stream Only, Batch Replay, Unified Engines: Where Lines Blur, Per-Node Freshness Tier Analysis, Lambda to Kappa Worked Example

  • Storage Layers and Table Formats: Beginner - beginner - 25 min

    Different shapes of storage exist because different jobs need different physics

    Topics: Storage Is Not Just the Database, The Data Warehouse, The Data Lake, The Operational Database, Picking the Right Storage Shape

  • Storage Layers and Table Formats: Intermediate - intermediate - 32 min

    Columns, partitions, compression, and pushdown turn a 10TB scan into a 100GB scan

    Topics: Columnar Versus Row Storage, Partitioning and File Pruning, Compression: Bytes Versus CPU, Predicate Pushdown, 10TB Versus 100GB: A Worked Example

  • Storage Layers and Table Formats: Advanced - advanced - 38 min

    Open table formats turn object storage into a transactional database without giving up the lake

    Topics: The Lakehouse: ACID on Object, Snapshot Isolation and Time Travel, Schema Evolution Without Rewrites, The Small Files Problem, Choosing Storage Across Workloads

  • Orchestration and Dependencies: Beginner - beginner - 25 min

    What runs, when, in what order, and what happens when something fails

    Topics: Why Cron Is Not an Orchestrator, The DAG: Tasks, Edges, No Cycles, What an Orchestrator Does, The Major Orchestrators by Name, First DAG: 3 Tasks, 1 Schedule

Python Lessons (42)

  • Python Foundations: Beginner - beginner - 18 min

    Your first lines of Python start here

    Topics: Variables and Assignment, Data Types, Print Statements, Basic Operators, Comments

  • Python Foundations: Intermediate - intermediate - 20 min

    Decisions, loops, and reusable logic

    Topics: Conditional Statements, Loops, Functions, Return Values, Variable Scope

  • Python Foundations: Advanced - advanced - 19 min

    Lambdas, comprehensions, and more

    Topics: Lambda Functions, List Comprehensions, Decorators, Generators, Context Managers

  • Python Expressions: Beginner - beginner - 27 min

    Where every Python journey begins

    Topics: How Computers Store Data, Variables and Naming, Assignment vs. Equality, Data Types and Strings, Operators and Readability

  • Python Expressions: Intermediate - intermediate - 38 min

    Making decisions with data

    Topics: Type Conversions, Comparison Operators, Logical Operators, Multiple Assignment, None and Identity

  • Python Expressions: Advanced - advanced - 40 min

    Patterns for technical interviews

    Topics: Multiple Assignment, Short-circuit Evaluation, Truthy and Falsy Values, Ternary Expressions, Walrus Operator (:=)

  • Control Flow: Beginner - beginner - 37 min

    Making decisions in code

    Topics: The if Statement, Branching with if-else, Chaining if-elif-else, Combining with and/or, Execution Flow

  • Control Flow: Intermediate - intermediate - 33 min

    Writing cleaner conditional logic

    Topics: Guard Clauses, Chained Comparisons, Pattern Matching with match-case, Conditional Assignment, Edge Case Handling

  • Control Flow: Advanced - advanced - 28 min

    Elegant patterns for complex decisions

    Topics: Boolean Simplification, De Morgan's Laws, State Machine Patterns, Dict-Based Dispatch, Decision Table Lookups

  • Loops: Beginner - beginner - 39 min

    Repeating actions efficiently

    Topics: Iterating with for Loops, range() Function, Loops with while, Using break and continue, Loop Variable Scope

SQL Lessons (30)

  • Query Structure: Beginner - beginner - 9 min

    Your first SQL query — demystified

    Topics: Tables, rows, and columns, SELECT and FROM basics, Selecting all columns (*), AS aliases for columns, Expressions in SELECT

  • Query Structure: Intermediate - intermediate - 28 min

    CTEs: subqueries are a cry for help

    Topics: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT

  • Query Structure: Advanced - advanced - 31 min

    SQL operators nobody warned you about

    Topics: Correlated subqueries, EXCEPT and EXCEPT ALL, INTERSECT and INTERSECT ALL, UNNEST for arrays, SELECT Without FROM

  • Data Types: Beginner - beginner - 19 min

    INT, VARCHAR, and the lies we tell

    Topics: Why data types matter, INTEGER for whole numbers, STRING types (VARCHAR), BOOLEAN for true/false, CAST for type conversion

  • Data Types: Intermediate - intermediate - 19 min

    Where pennies vanish and NULLs defy

    Topics: BOOLEAN and NULL logic, DECIMAL precision and scale, TIMESTAMP vs TIMESTAMP WITH TIME ZONE, Time zones and UTC handling, TRY_CAST and implicit casts

  • Data Types: Advanced - advanced - 21 min

    Arrays, maps, and type optimization at scale

    Topics: Type optimization at scale, Storage calculations and VARCHAR, MAP data type for key-value pairs, Accessing nested data (UNNEST), Compression and error handling

  • Filtering: Beginner - beginner - 35 min

    WHERE: your database bouncer

    Topics: WHERE clause for filtering rows, Equals and not equals (=, !=), Comparison operators (<, >), IN and NOT IN for list matching, AND for combining conditions

  • Filtering: Intermediate - intermediate - 24 min

    Boolean logic: it's complicated

    Topics: OR and CASE expressions, Operator precedence, LIKE for pattern matching, LIMIT and OFFSET for pagination, BETWEEN for range filtering

  • Filtering: Advanced - advanced - 19 min

    Subqueries, regex, and other crimes

    Topics: Correlated subqueries (EXISTS), NOT EXISTS for missing rows, NOT IN vs NULL gotchas, REGEXP_LIKE patterns, Regex operators and patterns

  • Aggregating: Beginner - beginner - 38 min

    A million rows walk into a SUM...

    Topics: GROUP BY for categorizing data, COUNT variations (*,col,DISTINCT), SUM and AVG calculations, MIN and MAX for extremes, HAVING for filtered groups

Related Resources