# The Regulars

> Past a certain threshold, casual becomes committed.

Canonical URL: <https://datadriven.io/problems/power_users_by_session_count>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The retention team considers anyone with more than 3 sessions an engaged user. Show each qualifying user ID and their session count.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply HAVING for post-aggregation filtering to the `user_sessions` and `users` tables, simulating a real session analysis workflow. Pay attention to the `user_id` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Group by user_id

`GROUP BY user_id` collapses sessions into one row per user.

#### Step 2: Count sessions per user

`COUNT(*) AS session_count` tallies how many sessions each user has.

#### Step 3: Filter with HAVING

`HAVING COUNT(*) > 3` keeps only engaged users.

---

### The solution

**Having filter for power users by session count**

```sql
SELECT user_id, COUNT(*) AS session_count
FROM user_sessions
GROUP BY user_id
HAVING COUNT(*) > 3
```

> **Cost Analysis**
>
> With `user_sessions` (50,000,000 rows), `users` (8,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `user_sessions.session_start` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `session_start` and uses DISTINCT or deduplication where needed.)_
- If `users` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `email`? _(Tests ability to identify performance hotspots related to `users.email` at scale.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/power_users_by_session_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.