# Average Sessions Per User

> How often do users come back?

Canonical URL: <https://datadriven.io/problems/average_sessions_per_user>

Domain: SQL · Difficulty: hard · Seniority: L3

## Problem

The board deck needs a single engagement benchmark: the average number of sessions per user across the entire platform.

## Worked solution and explanation

### Why this problem exists in real interviews

The denominator is the whole question. `user_sessions` only contains users who actually had a session, so any average computed off this table is the engagement of active users, not the platform. If the board sees "4.2 sessions per user" and the inactive half of the user base is missing, that number is wrong for the deck it's going into.

---

### Break down the requirements

#### Step 1: Count sessions per user

Inner query: `COUNT(*) GROUP BY user_id`. Every row in `user_sessions` is one session by construction (`session_id` is the grain), so `COUNT(*)` and `COUNT(DISTINCT session_id)` agree.

#### Step 2: Average the counts

Outer query averages those per-user counts. This is mathematically equivalent to `COUNT(*) * 1.0 / COUNT(DISTINCT user_id)` over the full table, and the one-pass form is dramatically cheaper at 200M rows.

#### Step 3: Confirm the denominator

If the deck wants average sessions across all registered users including zeros, you need a `LEFT JOIN` from `users` to `user_sessions`. The expected query treats inactive users as out of scope.

---

### The solution

**AVERAGE SESSIONS PER USER**

```sql
SELECT AVG(session_count) AS avg_sessions_per_user
FROM (
  SELECT user_id, COUNT(*) AS session_count
  FROM user_sessions
  GROUP BY user_id
)
```

> **Cost Analysis**
>
> No `WHERE` on `session_start` means every partition is read. At 200M rows the inner aggregate is the dominant cost; the outer `AVG` runs on one row per user. The single-pass `COUNT(*) / COUNT(DISTINCT user_id)` rewrite avoids materializing the per-user intermediate entirely.

> **Interviewers Watch For**
>
> Naming the denominator out loud. A board-level engagement number that silently excludes dormant users is a different KPI than one that includes them. Asking which definition the deck wants before writing SQL is the move.

> **Common Pitfall**
>
> Writing `AVG(COUNT(*))` as a flat query and hoping it parses. Aggregates don't nest in one `SELECT` without a subquery or window function. The fix is either the subquery shown or `COUNT(*) OVER (PARTITION BY user_id)` followed by an outer average over distinct users.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you compute the same metric only for the trailing 30 days? _(Tests partition pruning via a `session_start >= CURRENT_DATE - INTERVAL '30 days'` filter on the partition key.)_
- How would you include registered users with zero sessions in the average? _(Probes `LEFT JOIN` from a `users` dimension and `COALESCE(session_count, 0)`.)_
- How would you report median sessions per user instead of the mean? _(Tests `PERCENTILE_CONT(0.5)` and an understanding of why median is the better summary for a long-tailed engagement distribution.)_

**Subquery form (as written)**

Two scans of intermediate rows: inner `GROUP BY user_id` builds a per-user count table, outer `AVG` reduces it. Readable, matches the spec exactly.

**One-pass equivalent**

`SELECT COUNT(*) * 1.0 / COUNT(DISTINCT user_id) FROM user_sessions`. Same answer, single aggregate, no intermediate materialization. At 200M rows the planner appreciates it.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_sessions_per_user)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.