# Users Outperforming Control

> Treatment beat control. For these users.

Canonical URL: <https://datadriven.io/problems/users_outperforming_control>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The experimentation team is identifying users who outperformed the control baseline. For each user in treatment_a whose metric value exceeds the average control value for the same test, show the user ID, their treatment value, and the average control value.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests correlated subqueries or window functions for cross-variant comparison. Interviewers check whether you can compute a baseline from one group and compare individual rows from another group against it.

---

### Break down the requirements

#### Step 1: Compute control average per test

A subquery calculates `AVG(value)` from `ab_results` where `variant = 'control'`, grouped by `test_name`.

#### Step 2: Filter treatment_a users exceeding the control average

For each treatment_a row, compare its `value` against the control average for the same `test_name`.

#### Step 3: Return user, value, and control average

Show `user_id`, their treatment value, and the control average for context.

---

### The solution

**Correlated comparison against control baseline**

```sql
SELECT
    t.user_id,
    t.value AS treatment_value,
    c.control_avg
FROM ab_results t
JOIN (
    SELECT test_name, AVG(value) AS control_avg
    FROM ab_results
    WHERE variant = 'control'
    GROUP BY test_name
) c ON t.test_name = c.test_name
WHERE t.variant = 'treatment_a'
  AND t.value > c.control_avg
```

> **Cost Analysis**
>
> The subquery scans ~3M control rows (half of 6M). The main query scans ~3M treatment_a rows and joins on test_name. An index on `(variant, test_name, value)` helps both sides.

> **Interviewers Watch For**
>
> Whether you compare within the same test. A treatment user must beat the control average for their specific test, not the global control average.

> **Common Pitfall**
>
> Computing a global control average instead of per-test. Different tests have different baselines, so the comparison must be scoped to each test_name.

---

## Common follow-up questions

- How would you compute the lift percentage for each user? _(Calculate `(value - control_avg) / control_avg * 100` for percentage lift.)_
- What if there are multiple treatment variants? _(Filter to each variant separately or add variant to the output for comparison.)_
- How would you test statistical significance instead of just comparing to the mean? _(Tests knowledge of confidence intervals, t-tests, or bootstrap methods beyond simple SQL.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/users_outperforming_control)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.