# Multi-Variant Experiments

> One user, multiple experiments.

Canonical URL: <https://datadriven.io/problems/multi_variant_experiments>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Which experiments have participants in both 'variant_a' and 'variant_b'? Return the experiment name for every experiment that has at least one record in each variant.

## Worked solution and explanation

### Why this problem exists in real interviews

Working against experiments, this problem tests conditional aggregation and HAVING filter on the exp_name and variant columns. Interviewers use it as a fundamentals check because a subtle mis-grouping or filter placement changes the output without raising an error.

> **Trick to Solving**
>
> The phrase "has participants in both X and Y" signals a set-intersection problem at the group level. Look for language about a group containing multiple specific values.
> 
> 1. Group by the entity (experiment)
> 2. Use conditional aggregation: `SUM(CASE WHEN variant = 'variant_a' THEN 1 ELSE 0 END)`
> 3. Filter with `HAVING` to require both sums to be positive

---

### Break down the requirements

#### Step 1: Group by experiment

`GROUP BY exp_name` (or `exp_id`) collapses all participant records into one row per experiment.

#### Step 2: Count each variant conditionally

Use `SUM(CASE WHEN variant = 'variant_a' THEN 1 ELSE 0 END)` and the same for `variant_b` to count participants in each variant per experiment.

#### Step 3: Filter with HAVING

`HAVING` both sums `> 0` ensures both variants are present. Return `exp_name`.

---

### The solution

**Conditional aggregation with HAVING**

```sql
SELECT exp_name
FROM experiments
GROUP BY exp_name
HAVING SUM(CASE WHEN variant = 'variant_a' THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN variant = 'variant_b' THEN 1 ELSE 0 END) > 0
```

> **Cost Analysis**
>
> The query scans `experiments` (1,500,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if experiments.outcome contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on outcome.)_
- What happens to your CASE expressions if a new category value appears in exp_name, variant? _(Tests whether the candidate recognizes that hard-coded CASE values miss future categories.)_
- With millions of distinct values in experiments.exp_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like exp_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/multi_variant_experiments)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.