# Above Average Interactions

> The average user is boring. Who is above?

Canonical URL: <https://datadriven.io/problems/above_average_interactions>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

A downstream report shows a small set of power users drives most of the platform's session volume. Pull every user whose total session count exceeds the average session count across all users, and show their user ID alongside that total.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a textbook "compare each group to a global aggregate" pattern. It tests whether a candidate can compute a per-user count, then compare each user's count against the average count across all users. The trap is that the threshold is the average of counts, not the count of rows.

> **Trick to Solving**
>
> When the prompt says "users above the average," the threshold is **the average of per-user counts**, not the count of all rows divided by anything else.
> 
> 1. First aggregate to get one row per user with their session count
> 2. Compute the average of those per-user counts as a scalar subquery
> 3. Filter the per-user rows with `HAVING` against that scalar

---

### Break down the requirements

#### Step 1: Aggregate sessions to one row per user

`GROUP BY user_id` with `COUNT(*) AS total_sessions` gives one row per user with their session count.

#### Step 2: Compute the average of per-user counts

Wrap the per-user counts in a subquery and take `AVG(cnt)`. This is a single scalar that runs once. Computing `AVG(COUNT(*))` directly does not work in standard SQL.

#### Step 3: Filter to users above that average

`HAVING COUNT(*) > (scalar subquery)` keeps only users whose count exceeds the average. Use strict `>` so users sitting exactly at the average are excluded.

#### Step 4: Rank from most active down

`ORDER BY total_sessions DESC` puts the heaviest users first, matching the "ranked from most active down" requirement.

---

### The solution

**Group, then filter against the average of counts**

```sql
SELECT user_id, COUNT(*) AS total_sessions
FROM user_sessions
GROUP BY user_id
HAVING COUNT(*) > (
    SELECT AVG(cnt)
    FROM (
        SELECT COUNT(*) AS cnt
        FROM user_sessions
        GROUP BY user_id
    ) sub
)
ORDER BY total_sessions DESC
```

> **Time and Space Complexity**
>
> **Time:** Two passes over `user_sessions` (one for the inner per-user counts, one for the outer aggregate). Both are O(n) scans with a hash aggregate; the planner usually shares the GROUP BY work.
> 
> **Space:** O(u) where u is the number of distinct users (~2M here).

> **Interviewers Watch For**
>
> Strong candidates name the output grain ("one row per user") before writing any SQL. They also notice that `AVG(COUNT(*))` is illegal in standard SQL and reach for the wrapped subquery without prompting.

> **Common Pitfall**
>
> Comparing each user's count against `(SELECT COUNT(*) FROM user_sessions) / (SELECT COUNT(DISTINCT user_id) FROM user_sessions)`. Mathematically equivalent, but most candidates write `AVG(*)` over the raw rows by mistake, which is not the same as the average of per-user counts.

---

## Common follow-up questions

- How would you rewrite this using a window function instead of a scalar subquery? _(Tests whether the candidate knows window functions can replace correlated/scalar subqueries.)_
- Session counts are usually heavy-tailed. Would the median be a more useful threshold here, and how would you compute it? _(Tests awareness that the average is sensitive to outliers and the median may be more meaningful for skewed activity distributions.)_
- If the team only cared about the last 30 days, what changes about the query plan and the result? _(Tests partition-pruning awareness on a `session_start`-partitioned table.)_
- How would you find users above the average for their signup-month cohort instead of the global average? _(Tests whether the candidate can scale the pattern to per-cohort comparisons rather than a single global threshold.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/above_average_interactions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.