# Split Metric Sums

> One column, two totals.

Canonical URL: <https://datadriven.io/problems/split_metric_sums>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Compute two separate sums from employee metrics: one for metric IDs below 5 and another for metric IDs above 5. Show each sum as its own row with a label distinguishing the two groups.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets conditional aggregation via CASE across the `employee_metrics` table. You need to work with the `metric_id` and `metric_value` columns to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Classify rows with a CASE expression

The `CASE` expression evaluates conditions top to bottom and returns the first match. Order matters: put the most restrictive condition first to avoid misclassification.

#### Step 2: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

---

### The solution

**Case pivot for split metric sums**

```sql
SELECT
  CASE WHEN metric_id < 5 THEN 'below_5' ELSE 'above_5' END AS label,
  CAST(SUM(metric_value) AS DOUBLE) AS total
FROM employee_metrics
WHERE metric_id != 5
GROUP BY label
```

> **Cost Analysis**
>
> With ~10,000 rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Integer division truncates the result silently. Cast at least one operand to DOUBLE before dividing to get a decimal result.

---

## Common follow-up questions

- What would happen to your result if `employee_metrics.fiscal_year` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `fiscal_year` and uses DISTINCT or deduplication where needed.)_
- The `metric_value` column in `employee_metrics` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `employee_metrics.metric_value` and its impact on distributed query performance.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `employee_metrics` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/split_metric_sums)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.