# On Their Way Out

> They signed up. They never really showed up.

Canonical URL: <https://datadriven.io/problems/low_engagement_user_count>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

How many users have a total pages viewed across all sessions between 1 and 9, inclusive? Each row in user_sessions represents a unique session.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying user_sessions for session_start data using HAVING filter and grouping tests whether you can translate a business requirement into the right column references and filter sequence. It shows up as a fundamentals check to verify practical fluency.

---

### Break down the requirements

#### Step 1: Aggregate with COUNT/SUM

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Filter groups with HAVING

The `HAVING` clause filters after aggregation, unlike `WHERE` which filters before. This is necessary when the condition depends on an aggregate result.

---

### The solution

**Nested aggregate with BETWEEN filter**

```sql
SELECT COUNT(*) AS low_engagement_users
FROM (
    SELECT user_id
    FROM user_sessions
    GROUP BY user_id
    HAVING SUM(pages_viewed) BETWEEN 1 AND 9
) sub
```

> **Cost Analysis**
>
> The query scans 60M rows from `user_sessions`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- If user_sessions.session_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in session_id.)_
- What is the difference between filtering in WHERE versus HAVING for this query against user_sessions? _(Tests whether the candidate understands pre-aggregation vs post-aggregation filtering.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/low_engagement_user_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.