# Three-Value Sum Combinations

> Pick three. See what they add up to.

Canonical URL: <https://datadriven.io/problems/three_value_sum_combinations>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Find all combinations of 3 different employee metric records whose values sum to exactly 25. Each combination must use three separate records with no reuse. Show the three values in each qualifying combination.

## Worked solution and explanation

### The mental model: k-combinations as a self-join

This is a cost-awareness probe wrapped in combinatorics. Anyone can chain three self-joins. The interviewer wants you to compute the row explosion (N^3 / 6) out loud before you submit, and acknowledge that this pattern hits a wall around 50k rows. Strong candidates name the algorithmic alternative (sort plus two-pointer) even if SQL can't express it cleanly.

---

### The three traps

#### Step 1: No-reuse is not the same as no-duplicate-output

Beginners reach for `e1.metric_id != e2.metric_id AND e2.metric_id != e3.metric_id AND e1.metric_id != e3.metric_id`. That correctly enforces three separate records, but each underlying combination still surfaces six times, once per permutation (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1). The strict inequality `e1.metric_id < e2.metric_id < e3.metric_id` collapses all six permutations into the single canonical ascending ordering, so dedup is built into the join itself rather than bolted on with DISTINCT.

#### Step 2: Cartesian product cost explosion

Three CROSS JOINs over a 10,000 row table is 10^12 = 1 trillion tuples before any filter. The ascending-id predicate prunes the search space by 6x to roughly 167 billion, and the planner pushes that predicate into the join so it only enumerates ascending triples rather than materializing the full cube. Modern engines handle this in seconds via hash join with predicate pushdown at 10k rows. At 100k rows you would NOT reach for this pattern; you would precompute pair sums into an indexed table and look up the complement `25 - pair_sum` instead.

#### Step 3: Order by a unique key, never by the value you are summing

The inequality must reference a column that is unique per row. `metric_id` is the primary key, so it works. If you wrote `e1.metric_value < e2.metric_value < e3.metric_value` thinking it accomplished the same dedup, you would silently drop every triple where two records happen to share a value (strict `<` excludes equality), and you would still allow record reuse since the value is not row-identifying. Combinatorics dedup is a property of row identity, not row payload.

---

### The solution

**Three-way self-join with canonical ordering on metric_id**

```sql
SELECT e1.metric_value AS val1,
       e2.metric_value AS val2,
       e3.metric_value AS val3
FROM employee_metrics e1
CROSS JOIN employee_metrics e2
CROSS JOIN employee_metrics e3
WHERE e1.metric_id < e2.metric_id
  AND e2.metric_id < e3.metric_id
  AND e1.metric_value + e2.metric_value + e3.metric_value = 25
```

> **Why 10k is the ceiling for this pattern**
>
> Unfiltered, the join enumerates 10^12 tuples. The ascending-id predicates are sargable and get pushed into the join so the planner only walks 10^12 / 6 = ~167B ordered triples, then the sum-equals-25 predicate filters down to a few thousand rows. This finishes in seconds because the inner loops are tight integer comparisons. At ~50k rows the cubic factor pushes runtime past minutes, and at 100k it is unworkable. The escape hatch is to switch algorithms: precompute pair sums into a hash structure and probe for the complement, which is O(n^2) instead of O(n^3).

> **What a senior candidate asks before writing SQL**
>
> Three scoping questions a strong candidate raises. First, does 'three different employee metric records' mean three distinct `metric_id` values or three distinct `metric_value` values? The expected behavior is distinct records (metric_id), so two rows with the same value still count as separate. Second, is the prompt asking for combinations or permutations? The inequality answers that with one keystroke. Third, what about negative `metric_value` entries or duplicates of 25 itself (like 10 + 10 + 5 across three separate rows)? Those are valid because the constraint is on record identity, not value uniqueness.

> **The `!=` trap inflates output by 6x**
>
> Writing `WHERE e1.metric_id != e2.metric_id AND e2.metric_id != e3.metric_id AND e1.metric_id != e3.metric_id` is the single most common wrong answer. It enforces distinct records but does nothing about ordering, so every qualifying combination appears six times in the output. The grader compares row counts and rejects you. Fix: use strict `<` on a unique key to handle distinctness and canonical ordering in one predicate.

> **The 3-pointer escape hatch when N gets too large**
>
> When the cubic join is no longer viable, the imperative trick is: sort by `metric_value`, fix the smallest element of the triple, then sweep two pointers (one from the left, one from the right) through the remaining range looking for pairs that sum to `25 - smallest`. That is O(n^2 log n) instead of O(n^3). SQL cannot express the two-pointer sweep elegantly, which is exactly why this whole pattern hits a wall at moderate scale and you have to drop into Python or a procedural extension.

---

## Common follow-up questions

- If two `employee_metrics` rows have identical `metric_value`, do they both qualify as 'different records' for this prompt? _(Tests whether you understand the dedup is keyed on row identity (metric_id), not value equality. The answer is yes, they are distinct records.)_
- How many output rows do you get if you write `e1.metric_id != e2.metric_id AND e2.metric_id != e3.metric_id AND e1.metric_id != e3.metric_id` instead of the strict-inequality chain? _(Probes whether you can articulate the 6x permutation inflation and the distinction between distinctness and canonical ordering.)_
- Rewrite this for combinations of 4 records summing to 50. How many self-joins, and how long is the inequality chain? _(Generalizes the pattern: K self-joins, K-1 inequalities, and the sum predicate now has 4 terms. Also a setup for asking about the O(N^4) cost.)_
- At 100k records this query takes hours. What algorithm would you switch to? _(Looking for hash-the-complement (precompute pair sums into a lookup table, probe for `25 - pair`) or sort plus two-pointer. Either drops the complexity by an order of magnitude.)_
- What if `metric_value` can be negative? Does the query still produce correct results? _(Yes, the predicate `sum = 25` works fine with negatives. This is a probe to check you are not unconsciously assuming positivity (some candidates add `metric_value > 0` for no reason).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/three_value_sum_combinations)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.