# Average Initial Call Latency

> First contact latency. The benchmark.

Canonical URL: <https://datadriven.io/problems/average_initial_call_latency>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The performance team is measuring first-impression latency across the platform. Each user's earliest API call (by timestamp) is their initial call. Compute the average latency across all users' initial calls and return a single value.

## Worked solution and explanation

### What this is really asking

One latency value per user_id, picked by the earliest call_time across 400M api_calls rows, then averaged once. AVG(latency) on the raw table answers a different question entirely.

---

### Break down the requirements

#### Step 1: Rank each user's calls by time

ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY call_time). Ties on call_time are broken arbitrarily; if that matters, add a tiebreaker like call_id.

#### Step 2: Keep only the first row per user

WHERE rn = 1 in an outer query. The window function cannot live inside WHERE, so the subquery wrapper is mandatory, not stylistic.

#### Step 3: Average and cast to REAL

AVG over the per-user latencies, one row per user. CAST to REAL guards against integer division if latency is stored as INTEGER.

---

### The solution

**AVERAGE INITIAL CALL LATENCY**

```sql
SELECT CAST(AVG(latency) AS REAL) AS avg_initial_call_latency
FROM (
  SELECT latency,
         ROW_NUMBER() OVER (
           PARTITION BY user_id
           ORDER BY call_time
         ) AS rn
  FROM api_calls
) ranked
WHERE rn = 1
```

> **Cost Analysis**
>
> ROW_NUMBER over 400M rows partitioned by user_id sorts each partition. A correlated subquery picking MIN(call_time) per user reads the same volume twice. The window pass is one shuffle, one sort, one scan.

> **Interviewers Watch For**
>
> Whether you AVG the raw column (wrong, weights heavy users) or AVG the per-user initial value (right). Also whether you address tie-breaking on call_time and whether the cast is justified by latency being INTEGER.

> **Common Pitfall**
>
> Using RANK instead of ROW_NUMBER. If two calls share the same call_time, RANK returns both as rank 1 and the user gets double-counted in the AVG. ROW_NUMBER picks exactly one.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you compute the median initial latency instead of the average? _(Wrap the rn = 1 set and apply PERCENTILE_CONT(0.5) or a NTILE(2) split, since AVG hides tail behavior on latency distributions.)_
- What if you wanted the average over the first three calls per user? _(Change the filter to rn <= 3. The window definition stays the same, which is why ROW_NUMBER scales cleanly here.)_
- How would you make this incremental for a daily job? _(Materialize each user's first call_time once, then read only that row going forward instead of re-ranking 400M rows nightly.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_initial_call_latency)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.