Relationships

Tables do not exist in isolation. An order belongs to a customer. A customer has an address. An address is in a city. These connections are relationships, and modeling them correctly determines whether your queries return the right results or silently produce wrong ones. This lesson teaches you how to express real-world connections in your schema: one-to-one, one-to-many, many-to-many, required vs optional, and the tricky patterns like self-referential hierarchies.

What Are Relationships?

Daily Life

Interviews

Express real-world connections in schemas

Why Cardinality Matters More Than You Think

When you look at a schema, the first thing to understand is not the column names. It is the cardinality of the relationships between tables. Cardinality tells you whether a join will produce the same number of rows, more rows, or fewer rows than the input. Getting this wrong is the single most common source of incorrect query results in production systems.

Relationships are implemented using foreign keys, but the foreign key is just the mechanism. The real content is the business rule: can one customer have many orders? Can one order belong to many customers? The answer to these questions determines the shape of every downstream query.

1:11:NM:N

1:1

One-to-One

Each row in table A maps to exactly one row in table B. Joining them never changes the row count. Example: users and user_profiles.

1:N

One-to-Many

Each row in table A maps to zero or more rows in table B. Joining from the 'one' side to the 'many' side multiplies rows. Example: customers and orders.

M:N

Many-to-Many

Each row in A can map to many rows in B, and vice versa. Requires a junction table. Example: students and courses linked by enrollments.

customers

PKcustomer_id

name

orders

PKorder_id

FKcustomer_id

amount

order_date

The line connecting these tables represents a one-to-many relationship. customer_id in orders is a foreign key referencing customers. One customer can have many orders. Each order belongs to exactly one customer. When you join orders to customers, each order row gets the customer's data attached. The row count stays the same.

Why This Matters for Queries

If you join in the wrong direction without aggregating, your metrics inflate. Joining customers to orders produces one row per order. But if you then join to shipments (also many-per-order), you get a cartesian product: each order row is duplicated for every shipment. SUM(amount) is now inflated. Understanding cardinality prevents this class of bug entirely.

TIP

Before writing any JOIN, ask yourself: will this join multiply my rows? If the table you are joining to has multiple rows per join key, the answer is yes, and you need to either aggregate first or accept the fan-out intentionally.

Cardinality Explained

Daily Life

Interviews

Define one-to-one, one-to-many, many-to-many

Cardinality answers the question: for one row in table A, how many rows in table B can it relate to? Always read cardinality from both directions. The relationship between customers and orders is 1:N from the customer side, but 1:1 from the order side (each order belongs to exactly one customer).

One-to-One (1:1)

Each row in A maps to exactly one row in B. In practice, this usually means the two tables could be merged into one, but are kept separate for performance, security, or organizational reasons. A users table and a user_profiles table: every user has exactly one profile. Joining them never changes the row count.

users

PKuser_id

created_at

user_profiles

PKuser_id

bio

avatar_url

timezone

TIP

A 1:1 join is the only join that is guaranteed not to multiply your rows. If you are ever unsure whether a join is safe, check: is it 1:1? If yes, it cannot inflate your aggregates.

One-to-Many (1:N)

One-to-many is the most common relationship in data modeling. One customer has many orders. One department has many employees. One product has many reviews. The foreign key always lives on the 'many' side: orders has customer_id, not customers has order_id.

The critical implication: joining FROM the many side TO the one side is safe (row count unchanged). Joining FROM the one side TO the many side produces one output row per 'many' record. If you aggregate without accounting for this, your metrics are wrong.

departments

PKdept_id

dept_name

location

employees

PKemp_id

FKdept_id

name

salary

FK lives on the many side: employees.dept_id, not departments.emp_id.

Join from employees to departments: safe, row count unchanged.

Join from departments to employees: produces one row per employee. Must aggregate for department-level metrics.

Many-to-Many (M:N)

Many-to-many relationships cannot be represented with a single foreign key. A student can enroll in many courses. A course can have many students. Neither table can hold a single FK to the other. The solution is a junction table that holds one row per relationship instance.

The junction table has a composite primary key of both foreign keys: (student_id, course_id). Each combination appears at most once. If the junction table gains its own attributes (enrolled_at, grade), it becomes an entity in its own right and may warrant a surrogate key.

students

PKstudent_id

name

major

courses

PKcourse_id

title

credits

enrollments

PKstudent_id

FKcourse_id

enrolled_at

grade

✗Direct M:N (Wrong)

Comma-separated IDs in one column
Cannot enforce referential integrity
Cannot attach attributes to the relationship
Queries require string parsing

✓Junction Table (Correct)

One row per relationship
FK constraints enforce integrity
Can add attributes (date, grade, weight)
Standard JOIN queries work naturally

A junction table that records only that a relationship exists, with no measures, is called a factless fact table in dimensional modeling. It is the correct way to answer 'which students are enrolled in which courses?' without any numeric aggregation.

Required vs Optional

Daily Life

Interviews

Decide which relationships must exist

Optionality: What Happens When the FK Is NULL?

A required relationship means the FK cannot be NULL: every order MUST have a customer. An optional relationship means the FK can be NULL: an employee might not have a department yet. This distinction directly affects your queries. An INNER JOIN drops rows with NULL FKs. A LEFT JOIN preserves them but produces NULLs in the joined columns.

In analytical models, the standard practice for optional FKs is to create an 'Unknown' member in the dimension table (e.g., customer_id = -1, name = 'Unknown'). Then the FK is never NULL. Every fact row joins successfully, and reports show 'Unknown' instead of silently dropping rows.

The Unknown member is not a cosmetic convenience. It is a data integrity mechanism. Without it, an INNER JOIN to the dimension silently drops every fact row where the FK is NULL. Your revenue totals will be lower than reality.

Required relationship: NOT NULL constraint on the FK. Every row must reference a valid parent.

Optional relationship: FK is nullable. Use a LEFT JOIN or an Unknown member to preserve all rows.

An INNER JOIN on an optional FK silently drops rows. This is the most common source of 'missing data' bugs.

In data warehouses (BigQuery, Snowflake), FK constraints are not enforced. You must validate NULLs in your pipeline.

Reading ERD Notation

Entity-Relationship Diagrams use crow's foot notation to show cardinality and optionality at each end of a relationship line.

Single vertical bar = exactly one (required)

Circle + bar = zero or one (optional)

Crow's foot + bar = one or many (required)

Circle + crow's foot = zero or many (optional)

Reading a relationship line: look at both ends. 'customers ||----O< orders' means a customer has zero or many orders (crow's foot with circle on the orders side), and each order belongs to exactly one customer (double bar on the customers side).

NOT NULL FKNullable FKUnknown Member

NOT NULL FK

Required Relationship

The FK column has a NOT NULL constraint. Every child row must reference a valid parent. INNER JOIN is safe.

Nullable FK

Optional Relationship

The FK column allows NULL. Some child rows have no parent. Use LEFT JOIN or an Unknown dimension member.

Unknown Member

The Analytical Workaround

A special row in the dimension (ID = -1, name = 'Unknown') that acts as a default for NULL FKs. Eliminates NULLs from queries.

Self-Referential Tables

Daily Life

Interviews

Model hierarchies within a single table

Modeling Hierarchies Within a Single Table

Some relationships are between rows in the same table. An employee reports to a manager who is also an employee. A category has a parent category. A comment is a reply to another comment. These are self-referential relationships, and they model hierarchies.

The most common approach is the adjacency list: a parent_id column that references the same table's primary key. It is simple to implement but hard to query beyond one level. Getting all descendants of a node requires a recursive query (WITH RECURSIVE in SQL).

emp_id	name	manager_id
1	CEO (Alice)	NULL
2	VP Engineering (Bob)	1
3	VP Sales (Carol)	1
4	Senior Engineer (Dave)	2
5	Engineer (Eve)	4

Querying one level deep is easy: WHERE manager_id = 2 gives Bob's direct reports. Querying all of Alice's reports at every level requires a recursive CTE. On deep or wide hierarchies, recursive queries can be slow.

Adjacency List: simple parent_id column. Easy to insert, hard to query depth > 1.

Materialized Path: store the full path ('/1/2/4/5'). Easy subtree queries with LIKE.

Nested Sets: left/right boundary numbers. Fast subtree queries, very expensive updates.

Closure Table: separate table with all ancestor-descendant pairs. Fast queries, easy updates, more storage.

For most data engineering use cases, the adjacency list is the right starting point. If you need to query multiple levels frequently, consider materializing the hierarchy into a closure table during ETL.

TIP

Before choosing a hierarchy pattern, ask: how deep is the hierarchy, how often does it change, and what queries run against it? A 3-level category tree is fine as an adjacency list. A 15-level org chart queried for 'all reports under VP X' wants a closure table.

Complex Patterns

Daily Life

Interviews

Handle polymorphic and circular relationships

The Many-to-Many Fan-Out Problem

When you join through a many-to-many relationship, your row count explodes. If a product belongs to 3 categories and has 10 orders, joining products to categories to orders produces 30 rows instead of 10. This is the fan-out trap, and it silently inflates every SUM, COUNT, and AVG downstream.

The fix is to pre-aggregate before joining, or to use a bridge table with weighting factors that sum to 1.0 per entity. This is covered in depth in the Bridge Tables lesson.

Ternary Relationships

Some relationships involve three entities simultaneously. A supplier provides a specific part to a specific project. The relationship is not between any two entities alone. It is between all three at once. Modeling this requires a three-column junction table.

The junction table has three foreign keys: supplier_id, part_id, project_id. The grain is one row per supplier-part-project combination. Each FK pair alone may have duplicates. Only the full triple is unique.

suppliers

PKsupplier_id

name

parts

PKpart_id

description

supply_agreements

FKsupplier_id

FKpart_id

FKproject_id

price

Common Relationship Mistakes

Treating M:N as 1:N. An order can belong to multiple promotions. A single promotion_id FK loses the multi-promotion case.

Storing multiple values in one field: comma-separated tags, pipe-delimited IDs. Breaks queries, prevents FK enforcement, impossible to index.

Missing the fan-out: joining two 1:N relationships through the same parent without aggregating.

Always draw the relationships before writing the schema. A 2-minute diagram prevents hours of debugging.

✗Common Mistake

Comma-separated values in one column
Single FK for M:N
Joining without checking cardinality

✓Correct Approach

Junction table with proper FKs
Composite PK on the junction
Pre-aggregate or use bridge weights

❯❯❯PUTTING IT ALL TOGETHER

> You are designing the schema for a project management tool. Users belong to multiple teams, and tasks can be assigned to multiple users.

Users-to-teams is M:N. You create a team_members junction table with (user_id, team_id) as the composite PK, plus a role column.

Tasks-to-users is M:N. You create a task_assignments junction table with (task_id, user_id), plus assigned_at.

Tasks belong to one project (1:N). project_id lives on tasks. Before querying 'total hours by team,' you check the join path for fan-out through two M:N relationships.

KEY TAKEAWAYS

Cardinality determines join behavior: 1:1 is safe, 1:N multiplies from one side, M:N requires a junction table

FK lives on the many side: orders.customer_id, not customers.order_id

Optional FKs need Unknown members: prevents INNER JOIN from silently dropping rows

Self-referential hierarchies: adjacency list for simple cases, closure table for deep queries

Never comma-separate IDs: use a junction table for M:N relationships, always

How tables talk to each other

Category: Data Modeling
Duration: 18 minutes
Challenges: 12 hands-on challenges

Topics covered: What Are Relationships?, Cardinality Explained, Required vs Optional, Self-Referential Tables, Complex Patterns

Lesson Sections

What Are Relationships? (concepts: dmOneToMany)
Why Cardinality Matters More Than You Think When you look at a schema, the first thing to understand is not the column names. It is the cardinality of the relationships between tables. Cardinality tells you whether a join will produce the same number of rows, more rows, or fewer rows than the input. Getting this wrong is the single most common source of incorrect query results in production systems. Relationships are implemented using foreign keys, but the foreign key is just the mechanism. The
Cardinality Explained (concepts: dmManyToMany)
Cardinality answers the question: for one row in table A, how many rows in table B can it relate to? Always read cardinality from both directions. The relationship between customers and orders is 1:N from the customer side, but 1:1 from the order side (each order belongs to exactly one customer). One-to-One (1:1) Each row in A maps to exactly one row in B. In practice, this usually means the two tables could be merged into one, but are kept separate for performance, security, or organizational r
Required vs Optional (concepts: dmCardinalityRequired)
Optionality: What Happens When the FK Is NULL? A required relationship means the FK cannot be NULL: every order MUST have a customer. An optional relationship means the FK can be NULL: an employee might not have a department yet. This distinction directly affects your queries. An INNER JOIN drops rows with NULL FKs. A LEFT JOIN preserves them but produces NULLs in the joined columns. In analytical models, the standard practice for optional FKs is to create an 'Unknown' member in the dimension ta
Self-Referential Tables (concepts: dmSelfReferential)
Modeling Hierarchies Within a Single Table Some relationships are between rows in the same table. An employee reports to a manager who is also an employee. A category has a parent category. A comment is a reply to another comment. These are self-referential relationships, and they model hierarchies. The most common approach is the adjacency list: a parent_id column that references the same table's primary key. It is simple to implement but hard to query beyond one level. Getting all descendants
Complex Patterns (concepts: dmManyToMany)
The Many-to-Many Fan-Out Problem When you join through a many-to-many relationship, your row count explodes. If a product belongs to 3 categories and has 10 orders, joining products to categories to orders produces 30 rows instead of 10. This is the fan-out trap, and it silently inflates every SUM, COUNT, and AVG downstream. The fix is to pre-aggregate before joining, or to use a bridge table with weighting factors that sum to 1.0 per entity. This is covered in depth in the Bridge Tables lesson.