# Peak Activity by Device

> Activity windows, device by device.

Canonical URL: <https://datadriven.io/problems/peak_activity_by_device>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

For each device type, find the session time window (from session start to session start plus duration) with the highest number of concurrent users. Show device type, the time period as 'start to end', and user count.

## Worked solution and explanation

### Why this problem exists in real interviews

Capacity planning, autoscaler tuning, and rate limit design all start with one question: when do windows of activity overlap most? Interviewers reach for this prompt because it forces the candidate to convert duration into an end timestamp, then perform an interval overlap self join, then deterministically tie break. Each step is a separate failure mode.

---

### Break down the requirements

#### Step 1: Build session_end from session_start and duration

SQLite needs datetime(session_start, '+' || session_duration_sec || ' seconds'). Doing this in a CTE means the join condition can reference session_end without recomputing it, and the device_type from devices is attached in the same pass.

#### Step 2: Self join on overlap, partitioned by device_type

Two intervals [a_start, a_end] and [b_start, b_end] overlap when b_start <= a_end AND b_end >= a_start. Restrict to the same device_type so phones do not count toward laptop concurrency. COUNT(DISTINCT user_id) deduplicates users who happened to start two sessions inside the window.

#### Step 3: Pick the peak window per device with deterministic ties

ROW_NUMBER() OVER (PARTITION BY device_type ORDER BY user_count DESC, session_start ASC, session_end ASC) and filter rn = 1. The double tie break matches the prompt: highest count wins, then earliest start, then earliest end. ROW_NUMBER (not DENSE_RANK) returns exactly one row per device.

---

### The solution

**Materialize windows, self join on overlap, rank**

```sql
WITH session_windows AS (
  SELECT d.device_type, us.session_start,
         datetime(us.session_start, '+' || us.session_duration_sec || ' seconds') AS session_end,
         us.user_id
  FROM user_sessions us INNER JOIN devices d ON us.device_id = d.device_id
),
concurrent AS (
  SELECT sw1.device_type, sw1.session_start, sw1.session_end,
         COUNT(DISTINCT sw2.user_id) AS user_count
  FROM session_windows sw1
  INNER JOIN session_windows sw2
    ON sw1.device_type = sw2.device_type
   AND sw2.session_start <= sw1.session_end
   AND sw2.session_end >= sw1.session_start
  GROUP BY sw1.device_type, sw1.session_start, sw1.session_end
),
ranked AS (
  SELECT device_type, session_start, session_end, user_count,
         ROW_NUMBER() OVER (PARTITION BY device_type ORDER BY user_count DESC, session_start ASC, session_end ASC) AS rn
  FROM concurrent
)
SELECT device_type, session_start || ' to ' || session_end AS time_period, user_count
FROM ranked WHERE rn = 1
```

> **Cost Analysis**
>
> The interval self join is the expensive step. With 40M sessions, a naive cross join is intractable; partitioning the join key by device_type cuts the work by the cardinality of device_type. In production a sweepline algorithm (events at start become +1, end becomes negative one, running sum) replaces the self join entirely and runs in O(N log N). Interviewers accept the self join answer for whiteboard scope, but expect you to mention sweepline.

> **Interviewers Watch For**
>
> Whether the overlap predicate is correct (off by one is common), whether you remembered DISTINCT on user_id, and whether the tie breaker is fully specified. The string concatenation 'session_start to session_end' is a small touch that proves you read the output spec.

> **Common Pitfall**
>
> Forgetting DISTINCT in COUNT(DISTINCT sw2.user_id) inflates the concurrency by counting the same user twice when their sessions overlap. The other common miss is omitting the sw1.device_type = sw2.device_type predicate, which counts cross device overlaps and produces the wrong device assignment.

---

## Common follow-up questions

- Replace the self join with a sweepline approach. What does the SQL look like? _(Tests whether the candidate can build a UNION ALL of (start, +1) and (end, negative one) events and a running SUM window. This is the standard interview answer for production scale concurrency.)_
- How does the answer change if a session can span midnight UTC? _(Tests boundary thinking. The current solution does not bucket by day, so it just works, but if the prompt asked 'peak per day' you would need to clip windows at midnight.)_
- Why ROW_NUMBER instead of DENSE_RANK for the final ranking? _(Tests whether the candidate understands that the prompt explicitly wants one window per device. DENSE_RANK with rnk = 1 would return all ties, contradicting the tie break instructions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/peak_activity_by_device)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.