Joins: Beginner
Shopify powers over two million merchants, and every time a store owner opens their dashboard they see a complete picture of orders, customers, and products assembled in real time. That view is possible because Shopify's database joins an orders table to a customers table to a products table, weaving three separate data sources into a single coherent result. Without joins, a merchant would see only fragmented lists with no way to connect who bought what. The skill you are about to learn is the same one that makes dashboards like that possible.
The Row Pairing Model
Understand how databases match rows
Visual: How Pairs Form
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| order_id | cust_id | total |
|---|---|---|
| 101 | 1 | $50 |
| 102 | 1 | $30 |
| 103 | 2 | $75 |
| id | name | order_id | total |
|---|---|---|---|
| 1 | Alice | 101 | $50 |
| 1 | Alice | 102 | $30 |
| 2 | Bob | 103 | $75 |
> Complete this query to select the status from both tables without ambiguity.
SELECT , FROM ci_builds INNER JOIN code_reviews ON ci_builds.repo_name = code_reviews.repo_name
Once you internalize this model, every join type from INNER to FULL OUTER JOIN becomes just a variation on the same question: which unmatched rows, if any, should also appear in the output?
Join keys and data types
Pick the right columns to join on
Type Matching Matters
- Use identical types for join keys
- Match INTEGER to INTEGER
- Match VARCHAR to VARCHAR
- Join INTEGER to VARCHAR
- Rely on implicit type conversion
- Ignore type warnings
> Complete this join query by choosing the correct key columns to match customers with their orders.
SELECT c.name, o.total FROM customers AS c INNER JOIN orders AS o ON =
Primary and foreign keys
Identify table relationships through keys
Primary Key
- customer_id in a customers table
- order_id in an orders table
- product_id in a products table
Foreign Key
- customer_id is the PRIMARY KEY
- Uniquely identifies each customer
- order_id is the PRIMARY KEY
- customer_id is a FOREIGN KEY
> Complete this query to join users with their orders using the correct key columns.
SELECT u.name, o.total FROM users AS u INNER JOIN orders AS o ON =
Cardinality
Predict how many rows a join produces
One-to-One (1:1)
| user_id | |
|---|---|
| 1 | alice@email.com |
| 2 | bob@email.com |
| 3 | carol@email.com |
| user_id | bio |
|---|---|
| 1 | Engineer |
| 2 | Designer |
| 3 | Manager |
One-to-Many (1:N)
| cust_id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| order_id | cust_id |
|---|---|
| 101 | 1 |
| 102 | 1 |
| 103 | 1 |
| 104 | 2 |
| 105 | 2 |
| 106 | 3 |
Many-to-Many (N:N)
| student_id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| course_id | title |
|---|---|
| CS101 | Intro SQL |
| CS102 | Advanced |
| CS103 | Data Viz |
| student_id | course_id |
|---|---|
| 1 | CS101 |
| 1 | CS102 |
| 2 | CS101 |
| 2 | CS102 |
| 2 | CS103 |
| 3 | CS101 |
> Complete this query by choosing the correct key columns to chain customers to orders to products.
SELECT c.name, p.product_name FROM customers AS c INNER JOIN orders AS o ON = INNER JOIN products AS p ON =
INNER JOIN
Combine only rows that match in both tables
INNER JOIN is the workhorse of SQL joins. It returns only rows where both tables have matching values.
INNER JOIN is the most common join type. It returns only rows that have matches in both tables. If a row has no match, it does not appear in the output.
Syntax
The basic INNER JOIN pattern connects two tables on matching columns.
Example
| customer_id | name |
|---|---|
| C1 | Alice |
| C2 | Bob |
| C3 | Carol |
| order_id | customer_id | total |
|---|---|---|
| 101 | C1 | $50 |
| 102 | C1 | $30 |
| 103 | C2 | $75 |
| col_0 | col_1 | col_2 | ||
|---|---|---|---|---|
| C1 | Alice | 101 | C1 | $50 |
The ON clause specifies the join condition, matching customer_id values across both tables. When values match, a result row is emitted. Customers with no matching orders are excluded.
What INNER JOIN Excludes
INNER JOIN drops rows that have no match. If a customer has never ordered anything, they won't appear. If an order somehow has an invalid customer_id, it won't appear either.
- Rows from table1 that match table2
- Rows from table2 that match table1
- Only matching pairs (unmatched rows are excluded)
Writing Clean Joins
Column Qualification
The Problem
The Solution
Name collisions appear in over 90% of joins on real data models. Always qualify column names when joining tables using ON conditions.
Table Aliases
- Use meaningful short aliases (c for customers)
- Be consistent across your query
- Use aliases in all column references
- Use random letters (x, y, z)
- Mix aliased and non-aliased references
- Make aliases longer than table names
> Complete this query to retrieve customer names alongside their order totals.
c.name, o.total customers AS c orders AS o c.id = o.customer_id
NULL in Join Keys
NULL values in join columns cause silent row loss. This is because NULL never equals anything, not even another NULL.
The Rule
NULL = NULLisFALSE(notTRUE)NULL= anything isFALSE- Rows with
NULLjoin keys never match
Example
| id | cust_id |
|---|---|
| 1 | 100 |
| 2 | NULL |
| 3 | 101 |
| cust_id | name |
|---|---|
| 100 | Alice |
| 101 | Bob |
Order #2 has cust_id = NULL. When we INNER JOIN, order #2 will be silently dropped because NULL cannot match any customer_id.
NULL in join keys causes silent row loss in 15-25% of student queries. Always check if your join keys can contain NULL!Query Execution Order
- 1.
FROM: identify the tables - 2.
JOIN: combine rows from tables - 3.
WHERE: filter the combined rows - 4.
SELECT: choose which columns to return
This means the join happens BEFORE the WHERE clause filters rows. If you want to filter data before joining, you need different techniques (covered in intermediate joins).
Example
> You are a data analyst at Spotify building a headcount and team assignment report for the HR leadership team. Employee records and department assignments live in separate tables, and the report must show only employees who are currently assigned to a team.
ON condition filters matches.employee_metrics.dept_id and cost_allocs.dept_id must share the same data type to avoid silent type-cast performance penalties.department_id matches at most one department name in the result.INNER JOIN returns only employees who have a matching dept_id in the cost_allocs table, excluding anyone unassigned.INNER JOIN returns only rows where both tables have matching valuesON clause specifies which columns to match; always qualify column names with aliasesINNER JOIN and JOIN are equivalent; INNER is explicit but optionalGetting tables to talk to each other
- Category
- SQL
- Difficulty
- beginner
- Duration
- 19 minutes
- Challenges
- 0 hands-on challenges
Topics covered: The Row Pairing Model, Join keys and data types, Primary and foreign keys, Cardinality, INNER JOIN
Lesson Sections
- The Row Pairing Model (concepts: sqlSelfJoin)
Before learning specific join syntax, you need to understand the fundamental concept: how SQL matches rows from different tables. A join creates pairs of rows from two tables. Each row from the first table is tested against each row from the second table. When rows match a condition, they form a pair in the output. This is the core mental model for understanding joins. SQL does not copy data or modify the original tables. It produces a new result set made of matched row pairs. Visual: How Pairs
- Join keys and data types (concepts: sqlJoinCardinality)
The columns you choose to match on determine which rows get paired together. Selecting the right join keys is essential for correct results. The columns you use to match rows are called join keys. In the example above, customers.id and orders.cust_id are join keys. They contain the same kind of data: customer identifiers. For a join to work correctly, join keys must be comparable. This means they should have the same (or compatible) data types. Type Matching Matters If one column stores IDs as n
- Primary and foreign keys
Professional databases follow consistent patterns for connecting tables. Learning these patterns helps you identify join columns quickly. Databases use a pattern called primary key / foreign key (PK/FK) to organize relationships between tables. Understanding this pattern helps you know which columns to join on. Primary Key A primary key uniquely identifies each row in a table. No two rows can have the same primary key value. Common examples: Foreign Key A foreign key is a column that references
- Cardinality
How many rows match? This determines whether your result grows, shrinks, or stays the same size. Cardinality describes how many rows in one table match rows in another. Understanding cardinality helps you predict what your join results will look like. One-to-One (1:1) Each row in table A matches exactly one row in table B, and vice versa. Example: users and user_profiles tables Result: Same number of rows as each input table. One-to-Many (1:N) Each row in table A can match multiple rows in table
- INNER JOIN (concepts: sqlInnerJoin)
Syntax Example What INNER JOIN Excludes Writing Clean Joins Professional SQL requires clear, unambiguous code. Column qualification and table aliases make your joins readable and maintainable. Column Qualification When joining tables, both tables might have columns with the same name. SQL needs to know which table's column you mean. You specify this using table.column notation. The Problem This query is AMBIGUOUS if both tables have a "name" column: Error: Column "name" is ambiguous. SQL does no