# The Spread

> Data spread around a center. The range matters.

Canonical URL: <https://datadriven.io/problems/the_spread>

Domain: Python · Difficulty: easy · Seniority: L4

## Problem

Given a list of numbers, return the sample variance (sum of squared deviations divided by n-1), rounded to 2 decimals. Return 0.0 when fewer than 2 numbers.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **statistical computation from scratch**, specifically the sample variance formula. It probes whether a candidate understands the difference between population variance (divide by n) and sample variance (divide by n-1), a distinction that matters in A/B testing.

---

### Break down the requirements

#### Step 1: Handle the edge case of fewer than 2 elements

Sample variance requires at least 2 data points. Return 0.0 for lists with 0 or 1 elements.

#### Step 2: Compute the mean

Sum all values and divide by n.

#### Step 3: Compute the sum of squared deviations

For each value, compute `(value - mean) ** 2` and accumulate.

#### Step 4: Divide by n-1 and round

Bessel's correction (n-1 denominator) gives an unbiased estimate of population variance.

---

### The solution

**Two-pass variance with Bessel correction**

```python
def sample_variance(nums: list) -> float:
    n = len(nums)
    if n < 2:
        return 0.0
    total = 0
    for val in nums:
        total += val
    mean = total / n
    sq_diff_sum = 0
    for val in nums:
        sq_diff_sum += (val - mean) ** 2
    result = round(sq_diff_sum / (n - 1), 2)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) for two passes: one for the mean, one for squared deviations.
> 
> **Space:** O(1) beyond the input list. Only scalar accumulators are used.

> **Interviewers Watch For**
>
> Do you know why we divide by n-1 instead of n? Bessel's correction compensates for the bias introduced by estimating the population mean from the sample itself.

> **Common Pitfall**
>
> Using `sum()` and `len()` is fine for clarity, but the prompt asks for a manual implementation. Interviewers want to see the loop mechanics.

---

## Common follow-up questions

- When would you use population variance (n denominator) instead? _(Tests understanding of when you have the full population vs. a sample.)_
- How would you compute variance in a single pass? _(Tests Welford's online algorithm for numerical stability.)_
- What if the numbers are very large and close together? _(Tests awareness of catastrophic cancellation in floating-point arithmetic.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_spread)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.