# The Dormant Accounts

> They are still paying. They stopped showing up.

Canonical URL: <https://datadriven.io/problems/the_dormant_accounts>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Finance is looking at active accounts who haven't shown up in over 90 days, deciding whether to nudge them or shut them down. Surface each one's username, account status, the date of their most recent login, and how much they've spent over their lifetime.

## Worked solution and explanation

### Why this problem exists in real interviews

Three tables, two aggregates, one row-level filter, one aggregate filter. The interviewer is checking that you join with LEFT semantics so users without sessions or transactions still surface, that you compute the dormancy threshold against the latest session via `HAVING`, and that you guard against the fan-out trap when you double-join sessions and transactions.

> **Trick to Solving**
>
> Two LEFT JOINs from `users`, one GROUP BY `user_id`, plus the right filters in the right places.
> 
> 1. WHERE: `account_status = 'active'` (per-row predicate, runs before aggregation)
> 2. HAVING: `MAX(session_start) < today - 90 days` (per-group predicate)
> 3. COALESCE the spend so never-spenders appear with 0

---

### Break down the requirements

#### Step 1: Two LEFT JOINs from users

Start at `users`, LEFT JOIN `user_sessions` on `user_id` to bring in session timestamps, then LEFT JOIN `transactions` on `user_id` to bring in spend. LEFT JOIN keeps users with no rows on the right.

#### Step 2: Filter to active accounts

Apply `WHERE u.account_status = 'active'` so the cohort is locked in before aggregation. This filter is per-row, not per-group, so it belongs in WHERE.

#### Step 3: Aggregate per user

Group by `u.user_id, u.username, u.account_status`. Project `MAX(s.session_start) AS last_login` and `COALESCE(SUM(t.total_amount), 0) AS lifetime_spend`. The COALESCE handles users with no transactions.

#### Step 4: Apply the dormancy filter in HAVING

Use `HAVING MAX(s.session_start) < datetime('now', '-90 days')` to keep only the dormant ones. This is an aggregate predicate, so HAVING is the only valid place. Users who never logged in have `MAX(s.session_start) = NULL`, and `NULL < anything` is NULL (falsy), so they're excluded by this filter; if the business wants them included, change the predicate to `MAX(s.session_start) IS NULL OR MAX(...) < ...`.

---

### The solution

**Two LEFT JOINs, WHERE on status, HAVING on last login**

```sql
SELECT
    u.username,
    u.account_status,
    MAX(s.session_start) AS last_login,
    COALESCE(SUM(t.total_amount), 0) AS lifetime_spend
FROM users u
LEFT JOIN user_sessions s ON u.user_id = s.user_id
LEFT JOIN transactions t ON u.user_id = t.user_id
WHERE u.account_status = 'active'
GROUP BY u.user_id, u.username, u.account_status
HAVING MAX(s.session_start) < datetime('now', '-90 days')
```

> **Time and Space Complexity**
>
> **Time:** O(u + s + t) hash joins, where u is `users`, s is `user_sessions`, t is `transactions`. The double LEFT JOIN does fan out to ~13 sessions x 20 transactions per user, but `MAX` and `SUM` collapse correctly because both aggregates are commutative under that fan-out: SUM is amplified by the session fan-out, so production would prefer pre-aggregating in CTEs first. For the interview, the planner cleans most of it up.
> 
> **Space:** O(users) for the per-user aggregate hash table.

> **Interviewers Watch For**
>
> Strong candidates flag the fan-out: when you LEFT JOIN both `user_sessions` and `transactions` to `users`, each user row gets multiplied by both side counts. `MAX(session_start)` is fan-out safe, but `SUM(total_amount)` is not. They either pre-aggregate each side in a CTE or note the trap and move on with the single-pass shape.

> **Common Pitfall**
>
> Putting the dormancy threshold in WHERE instead of HAVING. WHERE runs before the GROUP BY and would filter individual session rows, missing users entirely if any one of their sessions is recent. HAVING runs after aggregation and operates on `MAX(session_start)`, which is what dormancy actually means.

---

## Common follow-up questions

- How would you rewrite this to avoid the fan-out interaction between sessions and transactions? _(Tests pre-aggregating each side in CTEs to neutralize the fan-out problem.)_
- How would the query change if finance also wanted users who have never logged in to appear in the dormant list? _(Tests handling NULL when a user never logged in.)_
- If this needs to run hourly on 10M users, what would you precompute or cache to keep it cheap? _(Tests incremental processing for hourly refresh.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_dormant_accounts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.