# Power Users by Session Activity

> More sessions. More time. The power users.

Canonical URL: <https://datadriven.io/problems/power_users_by_session_activity>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Marketing wants to feature power users in a case study. A power user is an active account holder with more than 3 sessions and over 100 total pages viewed across all of them. List each qualifying user's id, username, session count, and total pages viewed, with the heaviest users first.

## Worked solution and explanation

### Why this problem exists in real interviews

Two-table join, group, then filter on two aggregates at once. The interviewer is checking that you join `users` to `user_sessions`, restrict to the active cohort with a `WHERE`, and put both threshold checks in a single `HAVING`. The output grain is one row per power user.

> **Trick to Solving**
>
> Both thresholds are about **aggregates**, so they belong in `HAVING`, not `WHERE`. The status filter is a per-row predicate, so it belongs in `WHERE`. Mixing them up is the failure mode this question hunts for.

---

### Break down the requirements

#### Step 1: Join users to sessions

`JOIN user_sessions us ON u.user_id = us.user_id` pairs each active user with their session rows. An inner join is correct here because users with zero sessions cannot be power users.

#### Step 2: Filter the cohort

`WHERE u.account_status = 'active'` restricts to active accounts before aggregation, so inactive users never enter any group.

#### Step 3: Aggregate per user

`GROUP BY u.user_id, u.username` produces one row per user. Compute `COUNT(us.session_id) AS session_count` and `SUM(us.pages_viewed) AS total_pages` in the same SELECT.

#### Step 4: Apply dual HAVING and sort

`HAVING COUNT(us.session_id) > 3 AND SUM(us.pages_viewed) > 100` enforces both power-user thresholds. Order the result by `total_pages DESC` so the heaviest users come first.

---

### The solution

**Join, group, dual HAVING, ranked output**

```sql
SELECT
    u.user_id,
    u.username,
    COUNT(us.session_id) AS session_count,
    SUM(us.pages_viewed) AS total_pages
FROM users u
JOIN user_sessions us ON u.user_id = us.user_id
WHERE u.account_status = 'active'
GROUP BY u.user_id, u.username
HAVING COUNT(us.session_id) > 3 AND SUM(us.pages_viewed) > 100
ORDER BY total_pages DESC
```

> **Time and Space Complexity**
>
> **Time:** O(u + s) hash join + group, where u is `users` row count and s is `user_sessions` row count. The active-status filter is selective enough to push down before the join in most planners.
> 
> **Space:** O(g) for the per-user aggregate hash table where g is the number of qualifying active users.

> **Interviewers Watch For**
>
> Strong candidates put the row-level filter in WHERE and the aggregate filters in HAVING without prompting. They also include `user_id` in the SELECT so duplicate usernames don't collapse rows.

> **Common Pitfall**
>
> Putting `COUNT(*) > 3` in `WHERE` instead of `HAVING`. SQL evaluates `WHERE` before aggregation, so the engine errors with "aggregate function not allowed in WHERE" or, worse, the planner accepts a corrupted query that returns nothing.

---

## Common follow-up questions

- If the team wanted the top 1% of users by total pages viewed instead of a fixed >100 threshold, how would you restructure the query? _(Tests whether the candidate switches a hardcoded threshold to a percentile via window functions or a self-join against an aggregate.)_
- If the team only cared about power users in the last 30 days, what changes about the WHERE clause and the planner's behavior? _(Tests partition-pruning awareness on `session_start`.)_
- How would you produce a list that includes inactive users and users with 0 sessions, marked but not filtered out? _(Tests using a LEFT JOIN + COALESCE to include zero-session users in a separate cohort.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/power_users_by_session_activity)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.