# Latency Variance and Std Dev

> How much does latency actually vary?

Canonical URL: <https://datadriven.io/problems/latency_variance_and_std_dev>

Domain: SQL · Difficulty: hard · Seniority: L3

## Problem

For successful API calls (status 200), compute the mean latency, variance, and standard deviation.

## Worked solution and explanation

### What this is really asking

`api_calls` has 400M rows partitioned by `call_time`, and only `status = 200` counts. Three stats on `latency` from one filtered slice: mean, variance via `E[X^2] - E[X]^2`, and `SQRT(variance)`.

---

### Break down the requirements

#### Step 1: Filter once, conceptually

Every aggregate runs against `WHERE status = 200`. Pruning by `status` happens before the average, so the same filtered set feeds all three numbers.

#### Step 2: Variance as E[X^2\] minus E[X\]^2

`AVG(latency * latency) - AVG(latency) * AVG(latency)`. This is population variance, one pass over the data, no need to materialize the mean first.

#### Step 3: Stddev is SQRT of variance

Engines expose `STDDEV_POP` and `VAR_POP` directly. Building it by hand shows you understand the identity, but in production prefer the built-ins for numerical stability.

---

### The solution

**MEAN, VARIANCE, STDDEV ON STATUS 200**

```sql
SELECT
  (SELECT AVG(latency)
     FROM api_calls
     WHERE status = 200) AS mean_latency,
  (SELECT AVG(latency * latency) - AVG(latency) * AVG(latency)
     FROM api_calls
     WHERE status = 200) AS variance_latency,
  SQRT(
    (SELECT AVG(latency * latency) - AVG(latency) * AVG(latency)
       FROM api_calls
       WHERE status = 200)
  ) AS stddev_latency;
```

> **Cost Analysis**
>
> Three scalar subqueries means three scans of 400M rows. Rewrite as one `SELECT AVG(latency), VAR_POP(latency), STDDEV_POP(latency) FROM api_calls WHERE status = 200` and it collapses to one pass.

> **Interviewers Watch For**
>
> Do you spot the three scans? Do you know population vs sample variance (`VAR_POP` vs `VAR_SAMP`, divide by N vs N-1)? Do you flag that `latency * latency` on a large INTEGER column can overflow before the average?

> **Common Pitfall**
>
> `E[X^2] - E[X]^2` is mathematically clean but numerically fragile when latencies are large and variance is small relative to the mean. Two huge numbers subtract to a tiny one, eating precision. `STDDEV_POP` uses Welford and stays stable.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- Should this be population or sample variance? _(Population (divide by N) treats the 200-rows as the whole world. Sample (N-1) treats them as a draw from a larger distribution. For monitoring slices of production traffic, population is usually correct.)_
- How would you report variance per endpoint without three scans each? _(`SELECT endpoint, AVG(latency), VAR_POP(latency), STDDEV_POP(latency) FROM api_calls WHERE status = 200 GROUP BY endpoint`. One pass, one shuffle, every endpoint.)_
- Latency is heavy-tailed. Is mean even the right metric? _(Probably not. Tail-sensitive SLOs use p95 or p99 via `PERCENTILE_CONT` or `APPROX_QUANTILES`. Mean and stddev mislead when a few timeouts dominate the distribution.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/latency_variance_and_std_dev)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.