Tables do not exist in isolation. An order belongs to a customer. A customer has an address. An address is in a city. These connections are relationships, and modeling them correctly determines whether your queries return the right results or silently produce wrong ones. This lesson teaches you how to express real-world connections in your schema: one-to-one, one-to-many, many-to-many, required vs optional, and the tricky patterns like self-referential hierarchies.
What Are Relationships?
Daily Life
Interviews
Express real-world connections in schemas
Why Cardinality Matters More Than You Think
When you look at a schema, the first thing to understand is not the column names. It is the cardinality of the relationships between tables. Cardinality tells you whether a join will produce the same number of rows, more rows, or fewer rows than the input. Getting this wrong is the single most common source of incorrect query results in production systems.
Relationships are implemented using foreign keys, but the foreign key is just the mechanism. The real content is the business rule: can one customer have many orders? Can one order belong to many customers? The answer to these questions determines the shape of every downstream query.
1:11:NM:N
1:1
One-to-One
Each row in table A maps to exactly one row in table B. Joining them never changes the row count. Example: users and user_profiles.
1:N
One-to-Many
Each row in table A maps to zero or more rows in table B. Joining from the 'one' side to the 'many' side multiplies rows. Example: customers and orders.
M:N
Many-to-Many
Each row in A can map to many rows in B, and vice versa. Requires a junction table. Example: students and courses linked by enrollments.
The line connecting these tables represents a one-to-many relationship. customer_id in orders is a foreign key referencing customers. One customer can have many orders. Each order belongs to exactly one customer. When you join orders to customers, each order row gets the customer's data attached. The row count stays the same.
Why This Matters for Queries
If you join in the wrong direction without aggregating, your metrics inflate. Joining customers to orders produces one row per order. But if you then join to shipments (also many-per-order), you get a cartesian product: each order row is duplicated for every shipment. SUM(amount) is now inflated. Understanding cardinality prevents this class of bug entirely.
TIP
Before writing any JOIN, ask yourself: will this join multiply my rows? If the table you are joining to has multiple rows per join key, the answer is yes, and you need to either aggregate first or accept the fan-out intentionally.
Cardinality Explained
Daily Life
Interviews
Define one-to-one, one-to-many, many-to-many
Cardinality answers the question: for one row in table A, how many rows in table B can it relate to? Always read cardinality from both directions. The relationship between customers and orders is 1:N from the customer side, but 1:1 from the order side (each order belongs to exactly one customer).
One-to-One (1:1)
Each row in A maps to exactly one row in B. In practice, this usually means the two tables could be merged into one, but are kept separate for performance, security, or organizational reasons. A users table and a user_profiles table: every user has exactly one profile. Joining them never changes the row count.
TIP
A 1:1 join is the only join that is guaranteed not to multiply your rows. If you are ever unsure whether a join is safe, check: is it 1:1? If yes, it cannot inflate your aggregates.
One-to-Many (1:N)
One-to-many is the most common relationship in data modeling. One customer has many orders. One department has many employees. One product has many reviews. The foreign key always lives on the 'many' side: orders has customer_id, not customers has order_id.
The critical implication: joining FROM the many side TO the one side is safe (row count unchanged). Joining FROM the one side TO the many side produces one output row per 'many' record. If you aggregate without accounting for this, your metrics are wrong.
FK lives on the many side: employees.dept_id, not departments.emp_id.
Join from employees to departments: safe, row count unchanged.
Join from departments to employees: produces one row per employee. Must aggregate for department-level metrics.
Many-to-Many (M:N)
Many-to-many relationships cannot be represented with a single foreign key. A student can enroll in many courses. A course can have many students. Neither table can hold a single FK to the other. The solution is a junction table that holds one row per relationship instance.
The junction table has a composite primary key of both foreign keys: (student_id, course_id). Each combination appears at most once. If the junction table gains its own attributes (enrolled_at, grade), it becomes an entity in its own right and may warrant a surrogate key.
✗Direct M:N (Wrong)
Comma-separated IDs in one column
Cannot enforce referential integrity
Cannot attach attributes to the relationship
Queries require string parsing
✓Junction Table (Correct)
One row per relationship
FK constraints enforce integrity
Can add attributes (date, grade, weight)
Standard JOIN queries work naturally
A junction table that records only that a relationship exists, with no measures, is called a factless fact table in dimensional modeling. It is the correct way to answer 'which students are enrolled in which courses?' without any numeric aggregation.
Required vs Optional
Daily Life
Interviews
Decide which relationships must exist
Optionality: What Happens When the FK Is NULL?
A required relationship means the FK cannot be NULL: every order MUST have a customer. An optional relationship means the FK can be NULL: an employee might not have a department yet. This distinction directly affects your queries. An INNER JOIN drops rows with NULL FKs. A LEFT JOIN preserves them but produces NULLs in the joined columns.
In analytical models, the standard practice for optional FKs is to create an 'Unknown' member in the dimension table (e.g., customer_id = -1, name = 'Unknown'). Then the FK is never NULL. Every fact row joins successfully, and reports show 'Unknown' instead of silently dropping rows.
The Unknown member is not a cosmetic convenience. It is a data integrity mechanism. Without it, an INNER JOIN to the dimension silently drops every fact row where the FK is NULL. Your revenue totals will be lower than reality.
Required relationship: NOT NULL constraint on the FK. Every row must reference a valid parent.
Optional relationship: FK is nullable. Use a LEFT JOIN or an Unknown member to preserve all rows.
An INNER JOIN on an optional FK silently drops rows. This is the most common source of 'missing data' bugs.
In data warehouses (BigQuery, Snowflake), FK constraints are not enforced. You must validate NULLs in your pipeline.
Reading ERD Notation
Entity-Relationship Diagrams use crow's foot notation to show cardinality and optionality at each end of a relationship line.
Single vertical bar = exactly one (required)
Circle + bar = zero or one (optional)
Crow's foot + bar = one or many (required)
Circle + crow's foot = zero or many (optional)
Reading a relationship line: look at both ends. 'customers ||----O< orders' means a customer has zero or many orders (crow's foot with circle on the orders side), and each order belongs to exactly one customer (double bar on the customers side).
NOT NULL FKNullable FKUnknown Member
NOT NULL FK
Required Relationship
The FK column has a NOT NULL constraint. Every child row must reference a valid parent. INNER JOIN is safe.
Nullable FK
Optional Relationship
The FK column allows NULL. Some child rows have no parent. Use LEFT JOIN or an Unknown dimension member.
Unknown Member
The Analytical Workaround
A special row in the dimension (ID = -1, name = 'Unknown') that acts as a default for NULL FKs. Eliminates NULLs from queries.
Self-Referential Tables
Daily Life
Interviews
Model hierarchies within a single table
Modeling Hierarchies Within a Single Table
Some relationships are between rows in the same table. An employee reports to a manager who is also an employee. A category has a parent category. A comment is a reply to another comment. These are self-referential relationships, and they model hierarchies.
The most common approach is the adjacency list: a parent_id column that references the same table's primary key. It is simple to implement but hard to query beyond one level. Getting all descendants of a node requires a recursive query (WITH RECURSIVE in SQL).
emp_id
name
manager_id
1
CEO (Alice)
NULL
2
VP Engineering (Bob)
1
3
VP Sales (Carol)
1
4
Senior Engineer (Dave)
2
5
Engineer (Eve)
4
Querying one level deep is easy: WHERE manager_id = 2 gives Bob's direct reports. Querying all of Alice's reports at every level requires a recursive CTE. On deep or wide hierarchies, recursive queries can be slow.
Adjacency List: simple parent_id column. Easy to insert, hard to query depth > 1.
Materialized Path: store the full path ('/1/2/4/5'). Easy subtree queries with LIKE.
Nested Sets: left/right boundary numbers. Fast subtree queries, very expensive updates.
Closure Table: separate table with all ancestor-descendant pairs. Fast queries, easy updates, more storage.
For most data engineering use cases, the adjacency list is the right starting point. If you need to query multiple levels frequently, consider materializing the hierarchy into a closure table during ETL.
TIP
Before choosing a hierarchy pattern, ask: how deep is the hierarchy, how often does it change, and what queries run against it? A 3-level category tree is fine as an adjacency list. A 15-level org chart queried for 'all reports under VP X' wants a closure table.
Complex Patterns
Daily Life
Interviews
Handle polymorphic and circular relationships
The Many-to-Many Fan-Out Problem
When you join through a many-to-many relationship, your row count explodes. If a product belongs to 3 categories and has 10 orders, joining products to categories to orders produces 30 rows instead of 10. This is the fan-out trap, and it silently inflates every SUM, COUNT, and AVG downstream.
The fix is to pre-aggregate before joining, or to use a bridge table with weighting factors that sum to 1.0 per entity. This is covered in depth in the Bridge Tables lesson.
Ternary Relationships
Some relationships involve three entities simultaneously. A supplier provides a specific part to a specific project. The relationship is not between any two entities alone. It is between all three at once. Modeling this requires a three-column junction table.
The junction table has three foreign keys: supplier_id, part_id, project_id. The grain is one row per supplier-part-project combination. Each FK pair alone may have duplicates. Only the full triple is unique.
Common Relationship Mistakes
Treating M:N as 1:N. An order can belong to multiple promotions. A single promotion_id FK loses the multi-promotion case.
Storing multiple values in one field: comma-separated tags, pipe-delimited IDs. Breaks queries, prevents FK enforcement, impossible to index.
Missing the fan-out: joining two 1:N relationships through the same parent without aggregating.
Always draw the relationships before writing the schema. A 2-minute diagram prevents hours of debugging.
✗Common Mistake
Comma-separated values in one column
Single FK for M:N
Joining without checking cardinality
✓Correct Approach
Junction table with proper FKs
Composite PK on the junction
Pre-aggregate or use bridge weights
❯❯❯PUTTING IT ALL TOGETHER
> You are designing the schema for a project management tool. Users belong to multiple teams, and tasks can be assigned to multiple users.
Users-to-teams is M:N. You create a team_members junction table with (user_id, team_id) as the composite PK, plus a role column.
Tasks-to-users is M:N. You create a task_assignments junction table with (task_id, user_id), plus assigned_at.
Tasks belong to one project (1:N). project_id lives on tasks. Before querying 'total hours by team,' you check the join path for fan-out through two M:N relationships.
KEY TAKEAWAYS
Cardinality determines join behavior: 1:1 is safe, 1:N multiplies from one side, M:N requires a junction table
FK lives on the many side: orders.customer_id, not customers.order_id
Optional FKs need Unknown members: prevents INNER JOIN from silently dropping rows
Self-referential hierarchies: adjacency list for simple cases, closure table for deep queries
Never comma-separate IDs: use a junction table for M:N relationships, always
Relationships
How tables talk to each other
Category
Data Modeling
Duration
18 minutes
Challenges
12 hands-on challenges
Topics covered: What Are Relationships?, Cardinality Explained, Required vs Optional, Self-Referential Tables, Complex Patterns
Why Cardinality Matters More Than You Think When you look at a schema, the first thing to understand is not the column names. It is the cardinality of the relationships between tables. Cardinality tells you whether a join will produce the same number of rows, more rows, or fewer rows than the input. Getting this wrong is the single most common source of incorrect query results in production systems. Relationships are implemented using foreign keys, but the foreign key is just the mechanism. The
Cardinality answers the question: for one row in table A, how many rows in table B can it relate to? Always read cardinality from both directions. The relationship between customers and orders is 1:N from the customer side, but 1:1 from the order side (each order belongs to exactly one customer). One-to-One (1:1) Each row in A maps to exactly one row in B. In practice, this usually means the two tables could be merged into one, but are kept separate for performance, security, or organizational r
Optionality: What Happens When the FK Is NULL? A required relationship means the FK cannot be NULL: every order MUST have a customer. An optional relationship means the FK can be NULL: an employee might not have a department yet. This distinction directly affects your queries. An INNER JOIN drops rows with NULL FKs. A LEFT JOIN preserves them but produces NULLs in the joined columns. In analytical models, the standard practice for optional FKs is to create an 'Unknown' member in the dimension ta
Modeling Hierarchies Within a Single Table Some relationships are between rows in the same table. An employee reports to a manager who is also an employee. A category has a parent category. A comment is a reply to another comment. These are self-referential relationships, and they model hierarchies. The most common approach is the adjacency list: a parent_id column that references the same table's primary key. It is simple to implement but hard to query beyond one level. Getting all descendants
The Many-to-Many Fan-Out Problem When you join through a many-to-many relationship, your row count explodes. If a product belongs to 3 categories and has 10 orders, joining products to categories to orders produces 30 rows instead of 10. This is the fan-out trap, and it silently inflates every SUM, COUNT, and AVG downstream. The fix is to pre-aggregate before joining, or to use a bridge table with weighting factors that sum to 1.0 per entity. This is covered in depth in the Bridge Tables lesson.