# Detect Outliers

> Most values are normal. Some are suspicious.

Canonical URL: <https://datadriven.io/problems/detect_outliers>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given a list of numbers and multiplier k, compute the mean. Return the sorted list of indices where the value exceeds k * mean.

## Worked solution and explanation

### Why this problem exists in real interviews

Flagging values that deviate from the mean by a threshold factor is a basic **statistical filtering** operation used in monitoring pipelines. It tests two-pass processing: compute the mean first, then filter.

---

### Break down the requirements

#### Step 1: Compute the mean of the list

Sum all values and divide by the count. This requires a first pass through the data.

#### Step 2: Find indices where the value exceeds k times the mean

In a second pass, compare each value against `k * mean` and collect indices that exceed it.

---

### The solution

**Two-pass mean computation and threshold filter**

```python
def detect_outliers(values, k):
    total = 0
    for v in values:
        total += v
    mean = total / len(values)
    threshold = k * mean
    result = []
    for i in range(len(values)):
        if values[i] > threshold:
            result.append(i)
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n) for two linear passes: one for the mean, one for filtering.
> 
> **Space:** O(m) where m is the number of outlier indices returned.

> **Interviewers Watch For**
>
> Computing the mean in a first pass before filtering, not trying to do both in a single pass. The threshold depends on the full dataset mean, so you cannot filter while computing it.

> **Common Pitfall**
>
> Using `>` vs. `>=` inconsistently. The prompt says 'exceed k times the mean,' which means strictly greater than. Read the comparison operator carefully.

---

## Common follow-up questions

- What if you needed outliers based on standard deviation instead of mean? _(Tests computing both mean and standard deviation in a two-pass approach.)_
- How would you handle streaming data where the mean evolves? _(Tests online algorithms like Welford's method for running mean and variance.)_
- What if the list contains negative values? _(Tests whether the threshold logic still holds when the mean itself could be negative.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/detect_outliers)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.