Learn SQL, Python, and Data Modeling Interactively
Data Engineering Lessons
113+ interactive data engineering lessons with real code execution. Learn SQL queries, Python for data engineering, and data modeling through hands-on practice. Every lesson includes challenges you solve by writing and running real code against live databases.
Data Modeling Lessons (11)
- Keys & Identity - 22 min
Every record deserves a fingerprint
Topics: The Problem of Identity, Primary Keys: Data Identity, Foreign Keys, Composite Keys, Key Generation Strategies
- Schema Types - 22 min
Choosing the right box for every value
Topics: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework
- Relationships - 18 min
How tables talk to each other
Topics: What Are Relationships?, Cardinality Explained, Required vs Optional, Self-Referential Tables, Complex Patterns
- Normalization - 15 min
Why copying data breaks everything
Topics: Data Gets Out of Sync, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Identifying Normal Form
- Beyond 3NF - 19 min
Beyond third normal form
Topics: Boyce-Codd Normal Form, Fourth Normal Form (4NF), Fifth Normal Form (5NF), Strategic Denormalization, Denormalization Patterns
- Star Schemas - 30 min
Stars, snowflakes, and facts between
Topics: The Star Schema, Types of Fact Tables, Types of Dimensions, Defining the Grain, Surrogate Keys
- Nested Data - 15 min
When flat tables meet nested reality
Topics: The Nesting Decision, STRUCT: Embedded Objects, ARRAY: Ordered Collections, MAP: Dynamic Key-Value Pairs, Columnar Storage & Nesting
- Event Streams - 27 min
Data that never forgets
Topics: Event-Driven Architecture, Immutable Append-Only Logs, Event Sourcing, Clickstream Modeling, Handling Late-Arriving Data
- Pre-Aggregation - 23 min
Pre-computing answers before anyone asks
Topics: Why Pre-Aggregate?, Metric Types and Additivity, OLAP Cubes & Rollups, Granularity Design, Refresh Strategies & Materialized Views
- Design Patterns - 21 min
Blueprints for building data systems
Topics: Medallion Architecture, Data Vault, One Big Table (OBT), Semantic Layers, Pipeline DAG Design
Pipeline Architecture Lessons (30)
- What a Data Pipeline Is: Beginner - beginner - 25 min
Data lives where it is created, not where it is needed; pipelines move and reshape it
Topics: Why Pipelines Exist, The Four Roles in Any Pipeline, Reading a Pipeline Left to Right, A First End-to-End Pipeline, When a Pipeline Is Not Needed
- What a Data Pipeline Is: Intermediate - intermediate - 30 min
When one pipeline becomes many, the question is not what to build but how the pieces fit
Topics: Many Sources, One Curated Layer, ETL vs ELT, The DAG: Why Dependencies Form, Reading a Real Pipeline Diagram, One Source, Two Different Consumers
- What a Data Pipeline Is: Advanced - advanced - 35 min
Pipelines are products with owners, contracts, and lifecycles, not scripts that move data
Topics: Pipelines as Products, The Cross-Cutting Undercurrents, When to Split, When to Merge, Build vs Buy at Each Layer, Redesigning a Tangled Graph
- Batch vs Streaming: Beginner - beginner - 25 min
Data moves in scheduled chunks or in a continuous flow; the choice changes everything downstream
Topics: Two Ways Data Can Move, Batch: Picture, Rhythm, Example, Streaming: Picture, Rhythm, Example, What Real-Time Actually Means, Picking Batch or Streaming
- Batch vs Streaming: Intermediate - intermediate - 30 min
Latency, throughput, state, and cost are the dimensions; pick deliberately, not by default
Topics: Latency vs Throughput Tradeoff, Micro-Batch: The Middle Ground, Why Streaming Costs More, Stateful vs Stateless Transforms, When Batch Outgrows Itself
- Batch vs Streaming: Advanced - advanced - 35 min
Lambda, Kappa, and unified engines: architectures live or die on freshness tier discipline
Topics: Lambda Architecture, Kappa: Stream Only, Batch Replay, Unified Engines: Where Lines Blur, Per-Node Freshness Tier Analysis, Lambda to Kappa Worked Example
- Storage Layers and Table Formats: Beginner - beginner - 25 min
Different shapes of storage exist because different jobs need different physics
Topics: Storage Is Not Just the Database, The Data Warehouse, The Data Lake, The Operational Database, Picking the Right Storage Shape
- Storage Layers and Table Formats: Intermediate - intermediate - 32 min
Columns, partitions, compression, and pushdown turn a 10TB scan into a 100GB scan
Topics: Columnar Versus Row Storage, Partitioning and File Pruning, Compression: Bytes Versus CPU, Predicate Pushdown, 10TB Versus 100GB: A Worked Example
- Storage Layers and Table Formats: Advanced - advanced - 38 min
Open table formats turn object storage into a transactional database without giving up the lake
Topics: The Lakehouse: ACID on Object, Snapshot Isolation and Time Travel, Schema Evolution Without Rewrites, The Small Files Problem, Choosing Storage Across Workloads
- Orchestration and Dependencies: Beginner - beginner - 25 min
What runs, when, in what order, and what happens when something fails
Topics: Why Cron Is Not an Orchestrator, The DAG: Tasks, Edges, No Cycles, What an Orchestrator Does, The Major Orchestrators by Name, First DAG: 3 Tasks, 1 Schedule
Python Lessons (42)
- Python Foundations: Beginner - beginner - 18 min
Your first lines of Python start here
Topics: Variables and Assignment, Data Types, Print Statements, Basic Operators, Comments
- Python Foundations: Intermediate - intermediate - 20 min
Decisions, loops, and reusable logic
Topics: Conditional Statements, Loops, Functions, Return Values, Variable Scope
- Python Foundations: Advanced - advanced - 19 min
Lambdas, comprehensions, and more
Topics: Lambda Functions, List Comprehensions, Decorators, Generators, Context Managers
- Python Expressions: Beginner - beginner - 27 min
Where every Python journey begins
Topics: How Computers Store Data, Variables and Naming, Assignment vs. Equality, Data Types and Strings, Operators and Readability
- Python Expressions: Intermediate - intermediate - 38 min
Making decisions with data
Topics: Type Conversions, Comparison Operators, Logical Operators, Multiple Assignment, None and Identity
- Python Expressions: Advanced - advanced - 40 min
Patterns for technical interviews
Topics: Multiple Assignment, Short-circuit Evaluation, Truthy and Falsy Values, Ternary Expressions, Walrus Operator (:=)
- Control Flow: Beginner - beginner - 37 min
Making decisions in code
Topics: The if Statement, Branching with if-else, Chaining if-elif-else, Combining with and/or, Execution Flow
- Control Flow: Intermediate - intermediate - 33 min
Writing cleaner conditional logic
Topics: Guard Clauses, Chained Comparisons, Pattern Matching with match-case, Conditional Assignment, Edge Case Handling
- Control Flow: Advanced - advanced - 28 min
Elegant patterns for complex decisions
Topics: Boolean Simplification, De Morgan's Laws, State Machine Patterns, Dict-Based Dispatch, Decision Table Lookups
- Loops: Beginner - beginner - 39 min
Repeating actions efficiently
Topics: Iterating with for Loops, range() Function, Loops with while, Using break and continue, Loop Variable Scope
SQL Lessons (30)
- Query Structure: Beginner - beginner - 9 min
Your first SQL query — demystified
Topics: Tables, rows, and columns, SELECT and FROM basics, Selecting all columns (*), AS aliases for columns, Expressions in SELECT
- Query Structure: Intermediate - intermediate - 28 min
CTEs: subqueries are a cry for help
Topics: CTEs (WITH clause), Query Execution Order, Subqueries for temp results, UNION and UNION ALL, ORDER BY and LIMIT
- Query Structure: Advanced - advanced - 31 min
SQL operators nobody warned you about
Topics: Correlated subqueries, EXCEPT and EXCEPT ALL, INTERSECT and INTERSECT ALL, UNNEST for arrays, SELECT Without FROM
- Data Types: Beginner - beginner - 19 min
INT, VARCHAR, and the lies we tell
Topics: Why data types matter, INTEGER for whole numbers, STRING types (VARCHAR), BOOLEAN for true/false, CAST for type conversion
- Data Types: Intermediate - intermediate - 19 min
Where pennies vanish and NULLs defy
Topics: BOOLEAN and NULL logic, DECIMAL precision and scale, TIMESTAMP vs TIMESTAMP WITH TIME ZONE, Time zones and UTC handling, TRY_CAST and implicit casts
- Data Types: Advanced - advanced - 21 min
Arrays, maps, and type optimization at scale
Topics: Type optimization at scale, Storage calculations and VARCHAR, MAP data type for key-value pairs, Accessing nested data (UNNEST), Compression and error handling
- Filtering: Beginner - beginner - 35 min
WHERE: your database bouncer
Topics: WHERE clause for filtering rows, Equals and not equals (=, !=), Comparison operators (<, >), IN and NOT IN for list matching, AND for combining conditions
- Filtering: Intermediate - intermediate - 24 min
Boolean logic: it's complicated
Topics: OR and CASE expressions, Operator precedence, LIKE for pattern matching, LIMIT and OFFSET for pagination, BETWEEN for range filtering
- Filtering: Advanced - advanced - 19 min
Subqueries, regex, and other crimes
Topics: Correlated subqueries (EXISTS), NOT EXISTS for missing rows, NOT IN vs NULL gotchas, REGEXP_LIKE patterns, Regex operators and patterns
- Aggregating: Beginner - beginner - 38 min
A million rows walk into a SUM...
Topics: GROUP BY for categorizing data, COUNT variations (*,col,DISTINCT), SUM and AVG calculations, MIN and MAX for extremes, HAVING for filtered groups