# User 360

> One row per user. Everything they did, or didn't do.

Canonical URL: <https://datadriven.io/problems/user_360>

Domain: SQL · Difficulty: hard · Seniority: L6

## Problem

The product team wants a one-row-per-user activity summary, including users with zero activity. For every user, count their rows in transactions and their rows in user_sessions, substituting 0 when either side is empty. Return the username, transaction count, and session count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests multi-table LEFT JOIN with the fan-out trap. Interviewers check whether you understand that joining a base table to two fact tables simultaneously creates a cross-product that inflates counts unless you pre-aggregate.

> **Trick to Solving**
>
> The trap is the double LEFT JOIN. If a user has 10 transactions and 5 sessions, joining both produces 50 rows, inflating both counts. The fix is to pre-aggregate each fact table separately.
> 
> 1. Aggregate transactions per user in a CTE or subquery
> 2. Aggregate sessions per user in a separate CTE or subquery
> 3. LEFT JOIN users to both pre-aggregated results
> 4. COALESCE nulls to 0

---

### Break down the requirements

#### Step 1: Pre-aggregate transactions

A subquery `SELECT user_id, COUNT(*) AS txn_count FROM transactions GROUP BY user_id` collapses 80M rows to 4M.

#### Step 2: Pre-aggregate sessions

A subquery `SELECT user_id, COUNT(*) AS session_count FROM user_sessions GROUP BY user_id` collapses 60M rows to 4M.

#### Step 3: Left join from users to both aggregates

LEFT JOIN ensures users with zero activity in either table still appear with a COALESCE to 0.

---

### The solution

**Pre-aggregate to avoid the fan-out trap**

```sql
SELECT
    u.username,
    COALESCE(t.txn_count, 0) AS transaction_count,
    COALESCE(s.session_count, 0) AS session_count
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) AS txn_count
    FROM transactions
    GROUP BY user_id
) t ON u.user_id = t.user_id
LEFT JOIN (
    SELECT user_id, COUNT(*) AS session_count
    FROM user_sessions
    GROUP BY user_id
) s ON u.user_id = s.user_id
```

> **Cost Analysis**
>
> Two full scans (80M transactions, 60M sessions) for the subqueries, each reducing to ~4M rows. The final LEFT JOINs are 10M users against 4M aggregate rows each. Pre-aggregation is the key to avoiding the cross-product explosion.

> **Interviewers Watch For**
>
> The fan-out trap is the #1 thing interviewers watch for here. Candidates who join users directly to both transactions and sessions without pre-aggregating will get inflated counts.

> **Common Pitfall**
>
> Using `COUNT(t.transaction_id)` after a double LEFT JOIN without pre-aggregation. This counts cross-product rows, not actual transactions. A user with 10 transactions and 5 sessions would show 50 transactions.

---

## Common follow-up questions

- Could you solve this with UNION ALL instead of pre-aggregation? _(Stack both fact tables, group by user and source, then pivot. Valid but more complex.)_
- What if you needed average transaction amount alongside the count? _(Add AVG(CAST(total_amount AS DOUBLE)) to the transactions subquery.)_
- How would you handle this at petabyte scale? _(Pre-aggregate into materialized views or summary tables; avoid runtime aggregation.)_
- What if the user_id types differ between tables? _(The schema shows TEXT in all three tables, but type mismatches would require casting.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/user_360)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.