# Low-Engagement Sessions

> Users whose average session duration is below the engagement threshold

Canonical URL: <https://datadriven.io/problems/low_engagement_sessions>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The retention team is identifying users with weak engagement. Find every user whose average session duration falls below 1,000 seconds. Show the user ID and their average duration.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying user_sessions for session_start data using HAVING filter and grouping tests whether you can translate a business requirement into the right column references and filter sequence. It shows up in mid-level screens to verify practical fluency.

> **Trick to Solving**
>
> Retention queries map each user to their cohort (first activity month) and check for subsequent activity.
> 
> 1. Derive each user's cohort from their first event timestamp
> 2. Self-join or use a CTE to check for activity N periods later
> 3. Count retained users per cohort per period offset

---

### Break down the requirements

#### Step 1: Aggregate with AVG

Group by the output grain and apply `AVG()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Filter groups with HAVING

The `HAVING` clause filters after aggregation, unlike `WHERE` which filters before. This is necessary when the condition depends on an aggregate result.

---

### The solution

**HAVING clause for post-aggregation filter**

```sql
SELECT user_id, AVG(session_duration_sec) AS avg_duration
FROM user_sessions
GROUP BY user_id
HAVING AVG(session_duration_sec) < 1000
```

> **Cost Analysis**
>
> The query scans 80M rows from `user_sessions`. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- What happens to your result if user_sessions.device_id contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on device_id.)_
- What is the difference between filtering in WHERE versus HAVING for this query against user_sessions? _(Tests whether the candidate understands pre-aggregation vs post-aggregation filtering.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/low_engagement_sessions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.