# Session Pulse

> Engagement is slipping. Who is phoning it in?

Canonical URL: <https://datadriven.io/problems/session_pulse>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The product manager is investigating engagement drops and wants a per-user session summary. For each user, show their average session length, the total number of sessions, and the longest session they have had. Exclude any sessions with no duration on file. Only include users who have logged at least three sessions, ranked from longest average session to shortest.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets HAVING for post-aggregation filtering across the `user_sessions` table. You need to work with the `session_duration_sec` and `user_id` columns to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `session_duration_sec` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Aggregate by `user_id`

`GROUP BY user_id` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 4: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Having filter for session pulse**

```sql
SELECT user_id, AVG(session_duration_sec) AS avg_duration, COUNT(*) AS session_count, MAX(session_duration_sec) AS longest
FROM user_sessions
WHERE session_duration_sec IS NOT NULL
GROUP BY user_id
HAVING COUNT(*) >= 3
ORDER BY avg_duration DESC
```

> **Cost Analysis**
>
> With ~20M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- The `device_id` column in `user_sessions` has a 5% null rate. How does your query handle rows where `device_id` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `user_sessions.device_id` and understands how aggregates skip NULL values.)_
- `user_sessions.session_start` has roughly 18,000,000 distinct values. What index strategy would you use to avoid a full scan on `user_sessions`? _(Tests indexing knowledge specific to the high-cardinality `session_start` column in `user_sessions`.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/session_pulse)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.