# Platform Speed

> Which devices keep users longest?

Canonical URL: <https://datadriven.io/problems/platform_speed>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The product team is investigating whether device type affects how long users stay engaged. For each device type, show the typical session length, the longest session on record, and how many sessions were logged. Only include device types that have at least five sessions in the data ,  anything with fewer isn't statistically meaningful. List from the device type with the longest average session down to the shortest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this session analysis scenario to test self-join against the `user_sessions` and `devices` tables. The focus is on how you handle columns like `device_id`, `device_type`, and `session_duration_sec` when building the result. It also requires HAVING for post-aggregation filtering.

---

### Break down the requirements

#### Step 1: Join sessions to devices

`JOIN devices ON user_sessions.device_id = devices.device_id` adds `device_type` to each session.

#### Step 2: Group by device type and aggregate

`GROUP BY device_type` with `AVG(session_duration_sec)`, `MAX(session_duration_sec)`, and `COUNT(*)`.

#### Step 3: Filter with HAVING

`HAVING COUNT(*) >= 5` excludes device types with too few sessions.

#### Step 4: Order by average duration descending

`ORDER BY AVG(session_duration_sec) DESC` surfaces the most-engaged platforms first.

---

### The solution

**Having filter for platform speed**

```sql
SELECT
    d.device_type,
    AVG(s.session_duration_sec) AS avg_duration,
    MAX(s.session_duration_sec) AS max_duration,
    COUNT(*) AS session_count
FROM user_sessions s
JOIN devices d ON s.device_id = d.device_id
GROUP BY d.device_type
HAVING COUNT(*) >= 5
ORDER BY avg_duration DESC
```

> **Cost Analysis**
>
> At `user_sessions` (500,000,000 rows), `devices` (20,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `session_duration_sec` column in `user_sessions` has a 4% null rate. How does your query handle rows where `session_duration_sec` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `user_sessions.session_duration_sec` and understands how aggregates skip NULL values.)_
- `user_sessions.session_id` has roughly 500,000,000 distinct values. What index strategy would you use to avoid a full scan on `user_sessions`? _(Tests indexing knowledge specific to the high-cardinality `session_id` column in `user_sessions`.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/platform_speed)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.