# The Narrow Lens

> A narrow timeframe. Everything inside matters.

Canonical URL: <https://datadriven.io/problems/the_narrow_lens>

Domain: Python · Difficulty: medium · Seniority: L5

## Problem

Given positive integer window_size and a list of numeric values, return a list of the same length where position i is the average of the last min(i+1, window_size) values up to and including index i.

## Worked solution and explanation

### Why this problem exists in real interviews

Trailing averages are how monitoring dashboards, anomaly detectors, and feature pipelines smooth noisy signals before alerting or scoring. Interviewers use this prompt to see whether you handle the warm-up region (positions before the window is full) correctly and whether you can describe the cost of the naive recompute approach versus an incremental running sum.

---

### Break down the requirements

#### Step 1: Read the contract: stateless function, parallel output

The signature is `sliding_avg(window_size, values)` and it returns a list the same length as `values`. There is no class, no internal state across calls. One call in, one fully materialized list out.

#### Step 2: Handle the warm-up region

At index i the divisor is `min(i + 1, window_size)`, not `window_size`. Positions 0 through window_size minus 2 are warm-up and average all values seen so far. Dividing by the full window size early returns deflated averages and fails the visible tests.

#### Step 3: Compute the inclusive window bounds

Use a half-open slice `values[start:i + 1]` where `start = max(0, i - window_size + 1)`. The window is inclusive of position i and reaches back at most window_size positions. Off-by-one here flips the result by one slot and produces wrong values everywhere past the warm-up.

---

### The solution

**Stateless sliding average over the input list**

```python
def sliding_avg(window_size, values):
    if not values:
        return []
    result = []
    for i in range(len(values)):
        start = max(0, i - window_size + 1)
        window = values[start:i + 1]
        result.append(sum(window) / len(window))
    return result
```

> **Cost Analysis**
>
> Time: O(n * w) where n is len(values) and w is window_size, because each position re-sums up to w elements. For n up to 100K and w up to n, the worst case is quadratic. An incremental running sum (add `values[i]`, subtract `values[i - window_size]` once the window is full) drops this to O(n) and matters when n is large or `sliding_avg` runs inside a tight loop. Space is O(n) for the output plus O(w) for the slice.

> **Interviewers Watch For**
>
> Whether you state the warm-up rule out loud before writing code, whether you pick the right divisor (length of the actual window, not window_size), and whether you can produce the O(n) running-sum variant on follow-up. Strong candidates also note that the result is a list of floats, not ints.

> **Common Pitfall**
>
> Dividing by `window_size` for every position. The visible test with `values=[10, 20, 30, 40, 50]` and `window_size=3` expects `[10.0, 15.0, 20.0, ...]`, but dividing by 3 at index 0 returns 3.33 and the entire result drifts. The other classic miss is `values[start:i]` (exclusive of i) instead of `values[start:i + 1]`, which shifts every average back one slot.

---

## Common follow-up questions

- Rewrite this so each output element costs O(1) instead of O(window_size). What invariant do you maintain across iterations? _(Tests the running-sum optimization. Maintain `running = running + values[i] - values[i - window_size]` once `i >= window_size`. Each step is O(1), and the full pass becomes O(n).)_
- How does the algorithm change if the harness asked for the moving median instead of the moving average? _(Tests the two-heap or sorted-container pattern. A median needs O(log w) per step using a max-heap of the lower half and a min-heap of the upper half, with rebalancing on each slide.)_
- If `values` were 10M floats from a sensor stream, what breaks first and how do you defend against it? _(Tests numerical stability. With long streams of large floats, the running sum drifts due to floating-point error. Periodic full recomputation, Kahan summation, or chunked partial sums all bound the drift.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_narrow_lens)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.