# iOS Sessions by Device Type

> iOS engagement, device by device.

Canonical URL: <https://datadriven.io/problems/ios_sessions_by_device_type>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The mobile analytics team is preparing a usage report and needs to know the total number of sessions initiated from devices where device_type is 'mobile'. Return the device type and its session count.

## Worked solution and explanation

### Why this problem exists in real interviews

The core test is combining rows from user_sessions and devices using grouping, where a fan-out on the join key will silently corrupt aggregates. This appears in mid-level screens to separate careful thinkers from syntax-first candidates.

> **Trick to Solving**
>
> Read the prompt carefully for implicit constraints. The phrase structure hints at the grain of the output: what each row represents.
> 
> 1. Identify the output grain from the prompt (one row per what?)
> 2. Work backward from the desired output columns
> 3. Build the query inside-out: innermost subquery first, then layer on filters and aggregates

---

### Break down the requirements

#### Step 1: Join tables with INNER JOIN

Connect `user_sessions` and `devices` on `device_id` to keep only matching rows.

#### Step 2: Filter to the target rows

Apply the `WHERE` filter to restrict the working set before aggregation. Filtering early reduces the number of rows that downstream operations process.

#### Step 3: Aggregate with COUNT

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

---

### The solution

**Filtered join with single-group aggregate**

```sql
SELECT d.device_type, COUNT(s.session_id) AS session_count
FROM user_sessions s
JOIN devices d ON s.device_id = d.device_id
WHERE d.device_type = 'mobile'
GROUP BY d.device_type
```

> **Cost Analysis**
>
> The join touches `user_sessions` (40M rows) and `devices` (6M rows). `user_sessions` is partitioned by `session_start`, which the optimizer can exploit with a partition filter. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- What happens to your result if devices.browser contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on browser.)_
- If the join between user_sessions and devices produces a fan-out, how does that affect your aggregate? _(Tests awareness of join cardinality and its impact on SUM, COUNT, and AVG results.)_
- With millions of distinct values in user_sessions.session_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like session_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/ios_sessions_by_device_type)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.