# User Engagement Summary

> Sessions plus searches. The full engagement picture.

Canonical URL: <https://datadriven.io/problems/user_engagement_summary>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Show total sessions and total search queries per user. Include users who may have activity in only one of those areas.

## Worked solution and explanation

### What this is really asking

Two 50M-row tables, no master users table. A user who only searched, or only had sessions, must still appear with a zero on the missing side. The universe of user_ids has to come from both tables.

---

### Break down the requirements

#### Step 1: Pre-aggregate each side

COUNT(*) GROUP BY user_id on user_sessions and on search_queries separately. Aggregating before joining collapses 50M rows to one row per user and avoids any join-side fan-out.

#### Step 2: Build the full user universe

UNION (not UNION ALL) the user_id columns from both aggregates. That gives one row per distinct user across both worlds, including users who appear in only one table.

#### Step 3: LEFT JOIN both aggregates and COALESCE

Join the universe to each aggregate, wrap counts in COALESCE(..., 0) so users missing on one side report 0 rather than NULL.

---

### The solution

**USER ENGAGEMENT SUMMARY**

```sql
WITH session_counts AS (
  SELECT user_id, COUNT(*) AS total_sessions
  FROM user_sessions
  GROUP BY user_id
),
query_counts AS (
  SELECT user_id, COUNT(*) AS total_queries
  FROM search_queries
  GROUP BY user_id
),
all_users AS (
  SELECT user_id FROM session_counts
  UNION
  SELECT user_id FROM query_counts
)
SELECT au.user_id,
       COALESCE(sc.total_sessions, 0) AS total_sessions,
       COALESCE(qc.total_queries,   0) AS total_queries
FROM all_users au
LEFT JOIN session_counts sc ON au.user_id = sc.user_id
LEFT JOIN query_counts   qc ON au.user_id = qc.user_id;
```

> **Cost Analysis**
>
> Two GROUP BY scans over partitioned 50M-row tables, both shuffle on user_id. UNION dedupes the user_id set, then two LEFT JOINs on the small aggregates. Total work is hash-agg twice plus a small join, not 50M by 50M.

> **Interviewers Watch For**
>
> Whether you reach for FULL OUTER JOIN on raw tables (correct but expensive) versus pre-aggregate then UNION the keys (cheap). Also whether you remember COALESCE; raw LEFT JOIN leaves NULL, which is technically wrong for a count.

> **Common Pitfall**
>
> Using UNION ALL when building the user universe. Duplicate user_ids will fan out the LEFT JOINs and inflate counts. UNION (set semantics) is required here; the dedupe is doing real work.

> **The False Start**
>
> First instinct is FROM user_sessions LEFT JOIN search_queries ON user_id, then COUNT each side. Symptom: search-only users disappear, and within-user join fan-out multiplies counts by the other table's row count. Pivot to pre-aggregate, UNION the keys, then LEFT JOIN.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you restrict this to a single date range, say the last 30 days? _(Push date predicates into each pre-aggregate so the partition pruning fires before the GROUP BY.)_
- How would FULL OUTER JOIN compare in dialects that support it? _(Equivalent result, one fewer CTE, but engines without FULL OUTER (older MySQL) still need the UNION pattern.)_
- What if a user has sessions but you also want zero-session users from a dim_users table? _(Replace the UNION with SELECT user_id FROM dim_users so the universe is authoritative, then LEFT JOIN both aggregates as before.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/user_engagement_summary)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.