# Top 10 AB Test Variants

> The ten best-performing variants.

Canonical URL: <https://datadriven.io/problems/top_10_ab_test_variants>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For the 'onboarding_v3' A/B test, rank the top 10 results by metric value, showing the rank, variant name, and user ID. Do not include duplicate variants. Results should go from rank 1 upward.

## Worked solution and explanation

### Why this problem exists in real interviews

Experimentation platforms publish leaderboards for the top performers in each test so PMs can spot champion variants. Interviewers use this prompt to see whether you scope to one test, deduplicate (variant, user_id) pairs, and apply DENSE_RANK so the top 10 ranks include all ties without numeric gaps.

---

### Break down the requirements

#### Step 1: Scope to the right test

Filter WHERE test_name = 'onboarding_v3' before any ranking. ab_results has 6,000,000 rows across many tests, so this filter is the single largest cost reducer in the plan.

#### Step 2: Deduplicate (variant, user_id) before ranking

The prompt says no duplicate variant+user pairs. GROUP BY variant, user_id collapses repeated measurements for the same user under the same variant into a single row. The window then ranks distinct pairs.

#### Step 3: Use DENSE_RANK over value DESC, then filter rnk <= 10

DENSE_RANK keeps ties on the same rank and produces no gaps, so 'top 10' returns every row whose rank is 10 or fewer (which can exceed 10 rows on ties). Wrap the window in a subquery and filter the outer query: window functions are not allowed in WHERE.

---

### The solution

**Filter, dedupe, DENSE_RANK, then slice**

```sql
SELECT rnk, variant, user_id FROM (
  SELECT variant, user_id, value,
    DENSE_RANK() OVER (ORDER BY value DESC) AS rnk
  FROM ab_results
  WHERE test_name = 'onboarding_v3'
  GROUP BY variant, user_id
) WHERE rnk <= 10 ORDER BY rnk ASC
```

> **Cost Analysis**
>
> ab_results has 6,000,000 rows. The WHERE on test_name should reduce that drastically (a single test is a small slice of the table). The GROUP BY variant, user_id is a hash aggregate over the filtered slice. DENSE_RANK then sorts the deduped rows by value DESC. An index on (test_name, variant, user_id) or (test_name, value) makes the filter and order much cheaper.

> **Interviewers Watch For**
>
> They want DENSE_RANK rather than ROW_NUMBER (so ties are kept) and rather than RANK (which leaves gaps that can push the boundary past 10). They also want the filter on test_name applied before the GROUP BY and the rank applied after, in a single pass with a window function rather than a self-join.

> **Common Pitfall**
>
> Using ROW_NUMBER returns exactly 10 rows but breaks ties arbitrarily, which is not what the prompt asks for. Putting WHERE rnk <= 10 in the inner query fails because window functions are not legal in WHERE; the outer wrap is required. Forgetting the GROUP BY leaves duplicate (variant, user_id) pairs in the leaderboard.

---

## Common follow-up questions

- Why DENSE_RANK and not RANK? _(RANK leaves gaps after ties (1, 1, 3, ...). On a top-10 cutoff, two ties at rank 1 push the next rank to 3 and you can lose the actual third-best row. DENSE_RANK keeps ranks contiguous.)_
- How would you adapt this to top 10 per metric instead of overall top 10? _(Adds PARTITION BY metric to the window, and the candidate should note that filtering rnk <= 10 then returns up to 10 per metric rather than 10 total.)_
- What if value can be NULL? _(DENSE_RANK ORDER BY value DESC puts NULLs last in SQLite. The candidate should propose either filtering value IS NOT NULL or using ORDER BY value DESC NULLS LAST explicitly to make the intent clear.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_10_ab_test_variants)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.