# The Anomaly Detector

> Spot the outliers before they page someone.

Canonical URL: <https://datadriven.io/problems/the_anomaly_detector>

Domain: Python · Difficulty: hard · Seniority: L6

## Problem

Given data (list of [time, value] pairs), window (positive int), and threshold (float), scan values left-to-right. For each index >= window, compute the mean and population stdev over the previous `window` values (excluding current). Flag the current value as an anomaly if |value - mean| > threshold * stdev. Return a list of dicts with keys {timestamp, value, rolling_mean, rolling_std, z_score}, each numeric field rounded to 2 decimal places, in scan order.

## Worked solution and explanation

### Why this problem exists in real interviews

Detecting anomalies via rolling average deviation tests **sliding window computation**, **threshold logic**, and whether you can process a time series in a single pass. It is a key skill for monitoring and alerting pipelines.

---

### Break down the requirements

#### Step 1: Compute a rolling average of the preceding window

For each point, average the previous k values (or fewer at the start of the series).

#### Step 2: Flag points that deviate beyond the threshold

If the absolute difference between the current value and the rolling average exceeds the threshold, mark it as an anomaly.

#### Step 3: Return flagged indices or values

Collect the anomalous data points with their positions.

---

### The solution

**Rolling average with deviation threshold flagging**

```python
def detect_anomalies(values, window, threshold):
    anomalies = []
    for i in range(len(values)):
        start = max(0, i - window)
        if start == i:
            continue
        window_sum = 0
        for j in range(start, i):
            window_sum += values[j]
        window_avg = window_sum / (i - start)
        deviation = abs(values[i] - window_avg)
        if deviation > threshold:
            anomalies.append(i)
    return anomalies
```

> **Time and Space Complexity**
>
> **Time:** O(n * w) where n is the series length and w is the window size. Each window sum recalculates from scratch.
> 
> **Space:** O(a) where a is the number of anomalies detected.

> **Optimization Opportunity**
>
> A sliding window sum (add the new value, subtract the departing value) reduces time to O(n). This version uses the simple loop for clarity, but mention the optimization in the interview.

> **Interviewers Watch For**
>
> Whether you mention the O(n) sliding window optimization even if you implement the O(n*w) version first. Showing awareness of the improvement path matters at senior levels.

> **Common Pitfall**
>
> Including the current point in its own rolling average. The prompt says 'preceding data points,' meaning the window should not include the value being evaluated.

---

## Common follow-up questions

- How would you use standard deviation instead of a fixed threshold? _(Tests computing rolling std and flagging values beyond k standard deviations (z-score approach).)_
- What if the time series has seasonal patterns? _(Tests using seasonal decomposition or comparing against the same period in prior cycles.)_
- How would you handle this in a streaming context? _(Tests maintaining a fixed-size deque and incrementally updating the rolling sum.)_
- What if the threshold should adapt to the data distribution? _(Tests adaptive thresholds using exponentially weighted moving averages (EWMA).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_anomaly_detector)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.