Joins: Beginner

Shopify powers over two million merchants, and every time a store owner opens their dashboard they see a complete picture of orders, customers, and products assembled in real time. That view is possible because Shopify's database joins an orders table to a customers table to a products table, weaving three separate data sources into a single coherent result. Without joins, a merchant would see only fragmented lists with no way to connect who bought what. The skill you are about to learn is the same one that makes dashboards like that possible.

The Row Pairing Model

Daily Life
Interviews

Understand how databases match rows

Before learning specific join syntax, you need to understand the fundamental concept: how SQL matches rows from different tables.
A join creates pairs of rows from two tables. Each row from the first table is tested against each row from the second table. When rows match a condition, they form a pair in the output.
This is the core mental model for understanding joins. SQL does not copy data or modify the original tables. It produces a new result set made of matched row pairs.
//

Visual: How Pairs Form

Consider two small tables:
customers
idname
1Alice
2Bob
orders
order_idcust_idtotal
1011$50
1021$30
1032$75
When we join on customers.id = orders.cust_id, SQL pairs each customer with their matching orders:
idnameorder_idtotal
1Alice101$50
1Alice102$30
2Bob103$75
Notice that Alice appears twice because she has two orders. The join created one row for each matching pair: Alice-Order101 and Alice-Order102.
TIP
A join tests every combination of rows, but only outputs combinations where the join condition is true. Misunderstanding this accounts for 40-50% of early SQL mistakes.
The join concept itself has a long and interesting history in computer science.

> Complete this query to select the status from both tables without ambiguity.

SELECT
  ___,
  ___
FROM ci_builds
INNER JOIN code_reviews
  ON ci_builds.repo_name = code_reviews.repo_name
ci_builds.status
code_reviews.status
status
The row pairing model is the foundation of relational database theory because it gives you a precise, predictable way to reason about how data from multiple tables combines.
Understanding that SQL produces one output row per matched pair, not one per input table row, prevents most of the "why do I have duplicate rows?" confusion that beginners encounter.

Once you internalize this model, every join type from INNER to FULL OUTER JOIN becomes just a variation on the same question: which unmatched rows, if any, should also appear in the output?

Join keys and data types

Daily Life
Interviews

Pick the right columns to join on

The columns you choose to match on determine which rows get paired together. Selecting the right join keys is essential for correct results.
The columns you use to match rows are called join keys. In the example above, customers.id and orders.cust_id are join keys. They contain the same kind of data: customer identifiers.
For a join to work correctly, join keys must be comparable. This means they should have the same (or compatible) data types.
//

Type Matching Matters

If one column stores IDs as numbers and another stores them as text, the database may need to convert types before comparing. This implicit conversion:
Slower queries
Slower queries
Type conversion adds 20-40% overhead on large tables
Wrong matches
Wrong matches
Implicit conversion can cause unexpected matches or silent misses
Harder to read
Harder to read
Mixed types make queries confusing for other developers
To avoid these issues, ensure your join keys share the same data type on both sides.
Do
  • Use identical types for join keys
  • Match INTEGER to INTEGER
  • Match VARCHAR to VARCHAR
Don't
  • Join INTEGER to VARCHAR
  • Rely on implicit type conversion
  • Ignore type warnings

> Complete this join query by choosing the correct key columns to match customers with their orders.

SELECT
  c.name,
  o.total
FROM customers AS c
INNER JOIN orders AS o
  ON ___ = ___
o.customer_id
c.customer_id
c.order_id
Data type consistency in join keys is not just a best practice for performance; it is a correctness requirement because implicit conversions can cause values that look identical to fail to match.
When you define database schemas, storing all identifiers in the same type across every table that shares them is one of the highest-leverage decisions you can make for long-term query reliability.
Getting into the habit of qualifying every column with its table alias whenever you write a join makes your queries self-documenting and eliminates an entire class of ambiguity errors before they can occur.

Primary and foreign keys

Daily Life
Interviews

Identify table relationships through keys

Professional databases follow consistent patterns for connecting tables. Learning these patterns helps you identify join columns quickly.
Databases use a pattern called primary key / foreign key (PK/FK) to organize relationships between tables. Understanding this pattern helps you know which columns to join on.
//

Primary Key

A primary key uniquely identifies each row in a table. No two rows can have the same primary key value. Common examples:
PRIMARY KEY EXAMPLES
  • customer_id in a customers table
  • order_id in an orders table
  • product_id in a products table
//

Foreign Key

A foreign key is a column that references the primary key of another table. It creates the link between tables:
customers table
  • customer_id is the PRIMARY KEY
  • Uniquely identifies each customer
orders table
  • order_id is the PRIMARY KEY
  • customer_id is a FOREIGN KEY
When you join tables, you typically connect a foreign key to its referenced primary key. This pattern appears in 70-80% of joins in real databases.

> Complete this query to join users with their orders using the correct key columns.

SELECT
  u.name,
  o.total
FROM users AS u
INNER JOIN orders AS o
  ON ___ = ___
o.order_id
u.user_id
o.user_id
The primary key and foreign key pattern is so universal in relational databases that learning to recognize it instantly is one of the most valuable skills for reading any schema you encounter for the first time.
When you see a column named something_id in a table that is not the main entity of that table, it is almost always a foreign key pointing to a primary key in another table.
Joining on the wrong key is a silent error that can return results that look plausible but are factually incorrect, which is why confirming the primary-to-foreign key relationship before writing each join is worth the extra second of thought.

Cardinality

Daily Life
Interviews

Predict how many rows a join produces

How many rows match? This determines whether your result grows, shrinks, or stays the same size.
Cardinality describes how many rows in one table match rows in another. Understanding cardinality helps you predict what your join results will look like.
//

One-to-One (1:1)

Each row in table A matches exactly one row in table B, and vice versa.
Example: users and user_profiles tables
users
user_idemail
1alice@email.com
2bob@email.com
3carol@email.com
profiles
user_idbio
1Engineer
2Designer
3Manager
Result: Same number of rows as each input table.
usersPKuser_idemailprofilesPKuser_idbio
//

One-to-Many (1:N)

Each row in table A can match multiple rows in table B. This is the most common relationship.
Example: One customer has many orders
customers
cust_idname
1Alice
2Bob
3Carol
orders
order_idcust_id
1011
1021
1031
1042
1052
1063
Result: More rows than the "one" side table. Alice appears 3 times, Bob 2 times, Carol once.
customersPKcust_idnameordersorder_idFKcust_id
//

Many-to-Many (N:N)

Multiple rows in table A can match multiple rows in table B. This often requires a "bridge" table.
Example: Students enrolled in courses
students
student_idname
1Alice
2Bob
3Carol
courses
course_idtitle
CS101Intro SQL
CS102Advanced
CS103Data Viz
The bridge table connects them:
student_idcourse_id
1CS101
1CS102
2CS101
2CS102
2CS103
3CS101
Result: Can have many more rows than either input table.
studentsPKstudent_idnameenrollmentsFKstudent_idFKcourse_idcoursesPKcourse_idtitle
TIP
Most beginners assume joins are 1:1. In practice, about 60% of analytical joins are 1:N or N:N. Always check if your results have more rows than expected!
Many-to-many relationships require a bridge table between the two entities, and this pattern appears everywhere in real databases.

> Complete this query by choosing the correct key columns to chain customers to orders to products.

SELECT
  c.name,
  p.product_name
FROM customers AS c
INNER JOIN orders AS o
  ON ___ = ___
INNER JOIN products AS p
  ON ___ = ___
o.product_id
o.customer_id
o.order_id
c.id
p.id
Cardinality is one of the most important concepts to verify before running a join in production because a misunderstanding of the relationship type can silently multiply your row count by thousands.
The many-to-many pattern is especially common in analytics because business events like purchases, clicks, and enrollments naturally form relationships where both sides have multiple counterparts.
Chaining multiple joins together is the standard technique for traversing a data model, and the key to doing it correctly is always tracking which table provides the foreign key at each step in the chain.

INNER JOIN

Daily Life
Interviews

Combine only rows that match in both tables

INNER JOIN is the workhorse of SQL joins. It returns only rows where both tables have matching values.

INNER JOIN is the most common join type. It returns only rows that have matches in both tables. If a row has no match, it does not appear in the output.

//

Syntax

The basic INNER JOIN pattern connects two tables on matching columns.

1SELECT
2 columns
3FROM table1
4INNER JOIN table2
5 ON table1.column = table2.column
Result
columns
value1
value2
value3
//

Example

1SELECT
2 customers.name,
3 orders.order_id,
4 orders.total
5FROM customers
6INNER JOIN orders
7 ON customers.customer_id = orders.customer_id
Join Animation[object Object]
customers
customer_idname
C1Alice
C2Bob
C3Carol
orders
order_idcustomer_idtotal
101C1$50
102C1$30
103C2$75
result
col_0col_1col_2
C1Alice101C1$50
Row 1/3

The ON clause specifies the join condition, matching customer_id values across both tables. When values match, a result row is emitted. Customers with no matching orders are excluded.

//

What INNER JOIN Excludes

INNER JOIN drops rows that have no match. If a customer has never ordered anything, they won't appear. If an order somehow has an invalid customer_id, it won't appear either.

INNER JOIN RETURNS
  • Rows from table1 that match table2
  • Rows from table2 that match table1
  • Only matching pairs (unmatched rows are excluded)

Writing Clean Joins

Professional SQL requires clear, unambiguous code. Column qualification and table aliases make your joins readable and maintainable.
//

Column Qualification

When joining tables, both tables might have columns with the same name. SQL needs to know which table's column you mean. You specify this using table.column notation.
//

The Problem

This query is AMBIGUOUS if both tables have a "name" column:
1SELECT
2 name
3FROM employee_metrics
4INNER JOIN cost_allocs
5 ON employee_metrics.dept_id = cost_allocs.id
Result
ERROR
Column "name" is ambiguous
Error: Column "name" is ambiguous. SQL does not know which table's "name" column you want.
//

The Solution

Use table.column notation to be specific:
1SELECT
2 employee_metrics.name,
3 cost_allocs.name
4FROM employee_metrics
5INNER JOIN cost_allocs
6 ON employee_metrics.dept_id = cost_allocs.id
Result
employee_metrics.namecost_allocs.name
AliceEngineering
BobSales
CarolEngineering

Name collisions appear in over 90% of joins on real data models. Always qualify column names when joining tables using ON conditions.

//

Table Aliases

Table aliases give tables short nicknames. This makes queries easier to read and write, especially when table names are long.
1SELECT
2 customers.customer_id,
3 customers.name,
4 orders.order_id,
5 orders.total
6FROM customers
7INNER JOIN orders
8 ON customers.customer_id = orders.customer_id
In production codebases, over 95% of join queries use aliases. They make code cleaner and reduce typing.
Do
  • Use meaningful short aliases (c for customers)
  • Be consistent across your query
  • Use aliases in all column references
Don't
  • Use random letters (x, y, z)
  • Mix aliased and non-aliased references
  • Make aliases longer than table names

> Complete this query to retrieve customer names alongside their order totals.

___
  c.name,
  o.total
___ customers AS c
___ orders AS o
  ___ c.id = o.customer_id
ON
WHERE
SELECT
INNER JOIN
FROM
//

NULL in Join Keys

NULL values in join columns cause silent row loss. This is because NULL never equals anything, not even another NULL.

//

The Rule

NULL IN JOIN CONDITIONS
  • NULL = NULL is FALSE (not TRUE)
  • NULL = anything is FALSE
  • Rows with NULL join keys never match
//

Example

orders
idcust_id
1100
2NULL
3101
customers
cust_idname
100Alice
101Bob

Order #2 has cust_id = NULL. When we INNER JOIN, order #2 will be silently dropped because NULL cannot match any customer_id.

1SELECT
2 o.id,
3 c.name
4FROM orders AS o
5INNER JOIN customers AS c
6 ON o.cust_id = c.cust_id
Result
idname
1Alice
3Bob
TIP
In beginner datasets, NULL in join keys causes silent row loss in 15-25% of student queries. Always check if your join keys can contain NULL!
//

Query Execution Order

SQL's logical execution order matters for understanding when joins happen relative to filters:
EXECUTION ORDER
  • 1. FROM: identify the tables
  • 2. JOIN: combine rows from tables
  • 3. WHERE: filter the combined rows
  • 4. SELECT: choose which columns to return

This means the join happens BEFORE the WHERE clause filters rows. If you want to filter data before joining, you need different techniques (covered in intermediate joins).

//

Example

This query joins customers to orders, then filters to only large orders.
1SELECT
2 c.name,
3 o.total
4FROM customers AS c
5INNER JOIN orders AS o
6 ON c.id = o.customer_id
7WHERE o.total > 100
Result
nametotal
Alice150.00
Bob275.50
Carol120.00
FROM customers
FROM customers
Start with the customers table
JOIN with orders
JOIN with orders
Pair customer rows with matching order rows
WHERE total > 100
WHERE total > 100
Filter combined rows to large orders only
SELECT name, total
SELECT name, total
Return only the columns you need
PUTTING IT ALL TOGETHER

> You are a data analyst at Spotify building a headcount and team assignment report for the HR leadership team. Employee records and department assignments live in separate tables, and the report must show only employees who are currently assigned to a team.

The row pairing model explains how every employee_metrics row is tested against every cost_allocs row before the ON condition filters matches.
Join keys on employee_metrics.dept_id and cost_allocs.dept_id must share the same data type to avoid silent type-cast performance penalties.
Primary keys on the cost_allocs table guarantee each department_id matches at most one department name in the result.
INNER JOIN returns only employees who have a matching dept_id in the cost_allocs table, excluding anyone unassigned.
KEY TAKEAWAYS
Joins combine rows from two tables by testing each pair against a condition
INNER JOIN returns only rows where both tables have matching values
Join keys must have compatible data types; mismatched types hurt performance
Primary keys uniquely identify rows; foreign keys reference another table's primary key
One-to-many relationships cause row multiplication: one customer can match many orders
Use table aliases (c for customers, o for orders) to write cleaner queries
The ON clause specifies which columns to match; always qualify column names with aliases
INNER JOIN and JOIN are equivalent; INNER is explicit but optional

Getting tables to talk to each other

Category
SQL
Difficulty
beginner
Duration
19 minutes
Challenges
0 hands-on challenges

Topics covered: The Row Pairing Model, Join keys and data types, Primary and foreign keys, Cardinality, INNER JOIN

Lesson Sections

  1. The Row Pairing Model (concepts: sqlSelfJoin)

    Before learning specific join syntax, you need to understand the fundamental concept: how SQL matches rows from different tables. A join creates pairs of rows from two tables. Each row from the first table is tested against each row from the second table. When rows match a condition, they form a pair in the output. This is the core mental model for understanding joins. SQL does not copy data or modify the original tables. It produces a new result set made of matched row pairs. Visual: How Pairs

  2. Join keys and data types (concepts: sqlJoinCardinality)

    The columns you choose to match on determine which rows get paired together. Selecting the right join keys is essential for correct results. The columns you use to match rows are called join keys. In the example above, customers.id and orders.cust_id are join keys. They contain the same kind of data: customer identifiers. For a join to work correctly, join keys must be comparable. This means they should have the same (or compatible) data types. Type Matching Matters If one column stores IDs as n

  3. Primary and foreign keys

    Professional databases follow consistent patterns for connecting tables. Learning these patterns helps you identify join columns quickly. Databases use a pattern called primary key / foreign key (PK/FK) to organize relationships between tables. Understanding this pattern helps you know which columns to join on. Primary Key A primary key uniquely identifies each row in a table. No two rows can have the same primary key value. Common examples: Foreign Key A foreign key is a column that references

  4. Cardinality

    How many rows match? This determines whether your result grows, shrinks, or stays the same size. Cardinality describes how many rows in one table match rows in another. Understanding cardinality helps you predict what your join results will look like. One-to-One (1:1) Each row in table A matches exactly one row in table B, and vice versa. Example: users and user_profiles tables Result: Same number of rows as each input table. One-to-Many (1:N) Each row in table A can match multiple rows in table

  5. INNER JOIN (concepts: sqlInnerJoin)

    Syntax Example What INNER JOIN Excludes Writing Clean Joins Professional SQL requires clear, unambiguous code. Column qualification and table aliases make your joins readable and maintainable. Column Qualification When joining tables, both tables might have columns with the same name. SQL needs to know which table's column you mean. You specify this using table.column notation. The Problem This query is AMBIGUOUS if both tables have a "name" column: Error: Column "name" is ambiguous. SQL does no