# The Three-Way Report

> Three tables. One summary. Every piece depends on the others.

Canonical URL: <https://datadriven.io/problems/multi_table_report>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

Combine session engagement with transaction revenue into a single user report. For each user, show their name, total session count, and total transaction amount. Users with zero sessions or zero transactions must still appear; the report cannot silently drop inactive users.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you understand the **fan-out problem** when joining multiple one-to-many relationships through a shared key. The classic trap is joining sessions and transactions directly through users, which inflates counts. Interviewers use this to test your grasp of pre-aggregation and `LEFT JOIN` semantics.

> **Trick to Solving**
>
> When the prompt says "users with zero X must still appear," that signals `LEFT JOIN`. But joining users to both sessions and transactions directly causes a Cartesian explosion. Spot this by checking whether two child tables share the same parent key.
> 
> 1. Recognize the two one-to-many relationships converging on `user_id`
> 2. Pre-aggregate each child table into a CTE before joining
> 3. Use `LEFT JOIN` from users to each pre-aggregated CTE
> 4. Apply `COALESCE(..., 0)` to handle users with no matches

---

### Break down the requirements

#### Step 1: Pre-aggregate sessions per user

In a CTE, `SELECT user_id, COUNT(*) AS session_count FROM user_sessions GROUP BY user_id` reduces 30M session rows to 3M grouped rows. This prevents fan-out when joining.

#### Step 2: Pre-aggregate transactions per user

In a second CTE, `SELECT user_id, SUM(total_amount) AS total_amount FROM transactions GROUP BY user_id` reduces 60M transaction rows to 3M grouped rows.

#### Step 3: LEFT JOIN from users to both CTEs

`FROM users LEFT JOIN session_agg USING(user_id) LEFT JOIN txn_agg USING(user_id)` ensures every user appears even if they have zero sessions or zero transactions.

#### Step 4: COALESCE NULLs to zero

`COALESCE(session_count, 0)` and `COALESCE(total_amount, 0)` replace NULLs from non-matching LEFT JOINs with zeroes.

---

### The solution

**Pre-aggregate then LEFT JOIN to avoid fan-out**

```sql
WITH session_agg AS (
    SELECT user_id, COUNT(*) AS session_count
    FROM user_sessions
    GROUP BY user_id
),
txn_agg AS (
    SELECT user_id, SUM(total_amount) AS total_amount
    FROM transactions
    GROUP BY user_id
)
SELECT
    u.username,
    COALESCE(sa.session_count, 0) AS session_count,
    COALESCE(ta.total_amount, 0) AS total_amount
FROM users u
LEFT JOIN session_agg sa ON u.user_id = sa.user_id
LEFT JOIN txn_agg ta ON u.user_id = ta.user_id
```

> **Cost Analysis**
>
> With `users` (5,000,000 rows), `user_sessions` (30,000,000 rows), `transactions` (60,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> The first thing interviewers look for is whether you recognize the fan-out risk. Candidates who jump straight to `users JOIN sessions JOIN transactions` will produce inflated counts. Pre-aggregation is the tell of an experienced engineer.

> **Common Pitfall**
>
> Joining sessions and transactions directly to users without pre-aggregation causes a Cartesian product: a user with 10 sessions and 20 transactions generates 200 rows. Session count inflates to 200 and transaction totals multiply by 10.

---

## Common follow-up questions

- What happens if the table volume increases by 10x? _(Tests whether the candidate considers scan cost and indexing strategies at scale.)_
- How would you modify this query to handle ties differently? _(Tests understanding of RANK vs DENSE_RANK vs ROW_NUMBER.)_
- Could you rewrite this using a CTE for readability? _(Tests CTE fluency and whether the candidate can restructure without changing semantics.)_
- What if the grouping column had very high cardinality (millions of distinct values)? _(Tests awareness of GROUP BY memory pressure and spill-to-disk behavior.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/multi_table_report)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.