# The Row Count Surprise

> Same tables. Different handshakes. Wildly different results.

Canonical URL: <https://datadriven.io/problems/join_type_row_counts>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The data quality team is auditing how a users-vs-ad_impressions join behaves under the three classic semantics. Some users were never served an impression, and some impressions don't tie back to a known user. In a single result set, surface three rows labeled inner_join, left_join, and full_outer_join, each with the total row count that combination produces.

## Worked solution and explanation

### Why this problem exists in real interviews

Same two tables, three different join semantics, three different row counts. The interviewer is checking that you can name what each join keeps and drops, and that you can emulate FULL OUTER on engines that don't support it (SQLite, MySQL pre-8) by stitching two LEFT JOINs together with UNION ALL.

> **Trick to Solving**
>
> **inner**: rows where the key matches on both sides. **left**: every row from `users`, with NULL on the right when no impression exists. **full outer**: union of left and right outer joins. On SQLite, build it as `LEFT JOIN` UNION ALL the right-only rows from `ad_impressions LEFT JOIN users WHERE u.user_id IS NULL`.

---

### Break down the requirements

#### Step 1: One row per join type, three branches

Each join type produces one row in the output: `'inner_join'`, `'left_join'`, `'full_outer_join'`. Stitch them with `UNION ALL` so all three counts appear in a single result set with two columns: `join_type, row_count`.

#### Step 2: Inner and left are direct

Inner: `INNER JOIN` keeps only matched pairs. Left: `LEFT JOIN` keeps every user, with one row per matched impression and exactly one row (with NULLs) for unmatched users.

#### Step 3: Emulate full outer with two LEFT JOINs

Full outer: SQLite has no FULL OUTER JOIN, so build it as the LEFT JOIN result `UNION ALL` the right-only rows. The right-only branch is `ad_impressions LEFT JOIN users` with `WHERE u.user_id IS NULL`, which keeps the impressions that had no matching user.

---

### The solution

**Three counts side by side, FULL OUTER emulated for SQLite**

```sql
SELECT 'inner_join' AS join_type, COUNT(*) AS row_count
FROM users u
INNER JOIN ad_impressions ai ON u.user_id = ai.user_id
UNION ALL
SELECT 'left_join', COUNT(*)
FROM users u
LEFT JOIN ad_impressions ai ON u.user_id = ai.user_id
UNION ALL
SELECT 'full_outer_join', COUNT(*)
FROM (
    SELECT u.user_id, ai.impression_id
    FROM users u
    LEFT JOIN ad_impressions ai ON u.user_id = ai.user_id
    UNION ALL
    SELECT u.user_id, ai.impression_id
    FROM ad_impressions ai
    LEFT JOIN users u ON ai.user_id = u.user_id
    WHERE u.user_id IS NULL
)
```

> **Time and Space Complexity**
>
> **Time:** Each branch is one hash join over `users` (3M) and `ad_impressions` (200M). The full-outer emulation does two LEFT JOIN scans plus a UNION ALL but only counts (no projection materialization).
> 
> **Space:** O(u + i) for the hash tables on the smaller side; the COUNT collapses each branch to a single integer.

> **Interviewers Watch For**
>
> Strong candidates state, before writing SQL, what each row count will look like: inner equals matched pairs (less than left), left equals inner plus unmatched users, full_outer equals left plus unmatched impressions (those with NULL `user_id` or with `user_id` not in `users`).

> **Common Pitfall**
>
> Reaching for `FULL OUTER JOIN` directly. Postgres supports it; SQLite (the grader engine here) and older MySQL do not. The two-LEFT-JOIN with WHERE-IS-NULL pattern is portable and what most production warehouses end up using anyway.

---

## Common follow-up questions

- `ad_impressions.user_id` is 15% NULL. Which of the three counts is affected by those NULLs and how? _(Tests grasp of NULL semantics on the join key.)_
- If a user has 80 impressions on average, what fraction of the inner_join row count comes from a single fan-out user? _(Tests cardinality intuition with one-to-many relationships.)_
- How would you also return `only_in_users` (users with no impressions) and `only_in_impressions` (impressions with no user) as two more rows? _(Tests using set operations to compute the symmetric difference.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/join_type_row_counts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.