# Department Snapshot

> Who is underperforming and who is excelling?

Canonical URL: <https://datadriven.io/problems/department_snapshot>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The HR analytics team is building a department performance summary for the all-hands. For each department, show the lowest recorded metric, the highest, the average, and the spread between the two extremes. Rows with a missing metric has no value on file should be left out of all calculations. Only include departments that have contributed more than five valid readings ,  a smaller sample isn't reliable enough to surface. Order results from the widest performance spread to the narrowest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests multi-metric aggregation with NULL exclusion, HAVING, and a computed column (spread = max minus min). It is a classic department summary query.

---

### Break down the requirements

#### Step 1: Filter out NULL metrics

`WHERE metric_value IS NOT NULL` removes unscored rows.

#### Step 2: Aggregate per department

MIN, MAX, AVG, and `MAX - MIN` as spread, grouped by department.

#### Step 3: Filter and sort

`HAVING COUNT(*) > 5` ensures sufficient sample size. `ORDER BY spread DESC`.

---

### The solution

**Multi-metric summary with spread computation**

```sql
SELECT
    department,
    MIN(metric_value) AS min_metric,
    MAX(metric_value) AS max_metric,
    AVG(metric_value) AS avg_metric,
    MAX(metric_value) - MIN(metric_value) AS spread
FROM employee_metrics
WHERE metric_value IS NOT NULL
GROUP BY department
HAVING COUNT(*) > 5
ORDER BY spread DESC
```

> **Cost Analysis**
>
> Scan of 10M rows with NULL filter. GROUP BY reduces to ~20 departments. All aggregates computed in one pass.

> **Interviewers Watch For**
>
> Whether the candidate computes spread inline (MAX - MIN) vs in a subquery. The inline approach is cleaner and more efficient.

> **Common Pitfall**
>
> Including NULL metric_value rows would cause AVG to ignore them (correct for AVG), but COUNT(*) would include them, potentially passing the HAVING threshold incorrectly. The WHERE filter ensures consistency.

---

## Common follow-up questions

- What if you needed the median instead of the average? _(Tests PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY metric_value).)_
- How would you identify outlier departments? _(Compare each department's spread to the overall average spread.)_
- What if the threshold was dynamic (e.g., top 50% of departments by count)? _(Tests a subquery-based threshold using PERCENTILE.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/department_snapshot)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.