# Power Users

> Engagement separates tourists from regulars.

Canonical URL: <https://datadriven.io/problems/power_users>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The product team is selecting power users for a case-study writeup and needs every user with 5 or more sessions, showing their username and session count.

## Worked solution and explanation

### Why this problem exists in real interviews

This session analysis problem uses the `users` and `user_sessions` tables to evaluate self-join. Candidates must also demonstrate HAVING for post-aggregation filtering. Watch how the `user_id` and `username` columns interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Join users to sessions

`JOIN user_sessions ON users.user_id = user_sessions.user_id` pairs each user with their sessions.

#### Step 2: Group by user and count sessions

`GROUP BY users.user_id, users.username` with `COUNT(*)` tallies sessions per user.

#### Step 3: Filter with HAVING

`HAVING COUNT(*) >= 5` keeps only power users with 5+ sessions.

---

### The solution

**Having filter for power users**

```sql
SELECT u.username, COUNT(*) AS session_count
FROM users u
JOIN user_sessions s ON u.user_id = s.user_id
GROUP BY u.user_id, u.username
HAVING COUNT(*) >= 5
```

> **Cost Analysis**
>
> With `users` (10,000,000 rows), `user_sessions` (60,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `users.account_status` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `account_status` and uses DISTINCT or deduplication where needed.)_
- `user_sessions.user_id` has roughly 4,000,000 distinct values. What index strategy would you use to avoid a full scan on `user_sessions`? _(Tests indexing knowledge specific to the high-cardinality `user_id` column in `user_sessions`.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/power_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.