# Cross-Variant User Pairs

> Same experiment. Different variants. Who overlaps?

Canonical URL: <https://datadriven.io/problems/cross_variant_user_pairs>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The experimentation team is looking for cross-variant contamination. Pair users who participated in the same experiment under different variants but on the same platform. Show both user IDs for each qualifying pair.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests self-joins with inequality conditions for generating unique pairs. The requirement to find users in the same experiment on the same platform but different variants probes multi-column join logic.

---

### Break down the requirements

#### Step 1: Self-join on experiment and platform

Join `experiments` to itself on `exp_name` and `platform` to find co-participants.

#### Step 2: Ensure different variants

Add `e1.variant != e2.variant` to the join condition.

#### Step 3: Ensure unique pairs

Add `e1.user_id < e2.user_id` to avoid (A,B) and (B,A) duplicates and self-pairs.

---

### The solution

**Self-join with inequality for unique pairs**

```sql
SELECT DISTINCT e1.user_id AS user_1, e2.user_id AS user_2
FROM experiments e1
JOIN experiments e2
    ON e1.exp_name = e2.exp_name
    AND e1.platform = e2.platform
    AND e1.variant != e2.variant
    AND e1.user_id < e2.user_id
```

> **Cost Analysis**
>
> Self-join on 3M rows. The join condition on (exp_name, platform) with inequality on variant can produce large intermediate results for popular experiments. An index on (exp_name, platform, variant) would help.

> **Interviewers Watch For**
>
> Whether the candidate uses `user_id < user_id` to deduplicate pairs. Without this, every pair appears twice. This is a classic self-join pattern.

> **Common Pitfall**
>
> Omitting the `user_id < user_id` condition doubles the output. Omitting DISTINCT could produce duplicates if a user participated in the same experiment multiple times.

---

## Common follow-up questions

- How would you also show which experiment and platform each pair shares? _(Add exp_name and platform to the SELECT.)_
- What if you needed to count the number of contaminated pairs per experiment? _(Wrap in a GROUP BY exp_name with COUNT(*).)_
- How would you handle users who appear in more than two variants? _(The self-join naturally handles this, generating a pair for each variant combination.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/cross_variant_user_pairs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.