# Price Pairs

> Same shelf, wildly different stickers. Spot the pricing gaps.

Canonical URL: <https://datadriven.io/problems/price_pairs>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

The pricing analyst wants to flag wide price gaps inside a single category. Find every pair of products in the same category where one product costs at least twice as much as the other, and list each pair exactly once. Return both product_name values, the shared category, and both prices.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is self-join, applied to the `products` table in a product analytics context. Getting columns like `product_id`, `category`, and `price` right is where most candidates slip.

> **Trick to Solving**
>
> "List each pair exactly once" signals an ordering constraint on the self-join. Without it, you get (A,B) and (B,A).
> 
> 1. Self-join products on `category`
> 2. Enforce `p1.product_id < p2.product_id` to deduplicate
> 3. Apply the price ratio condition: `p1.price >= 2 * p2.price OR p2.price >= 2 * p1.price`

---

### Break down the requirements

#### Step 1: Self-join products on category

`JOIN products p2 ON p1.category = p2.category AND p1.product_id < p2.product_id` creates ordered pairs within each category.

#### Step 2: Apply the 2x price condition

`WHERE p1.price >= 2 * p2.price OR p2.price >= 2 * p1.price` keeps only pairs with a significant gap.

#### Step 3: Return pair details

Select both product names, the shared category, and both prices.

---

### The solution

**Self-join products on category to find price pairs**

```sql
SELECT
    p1.product_name AS product_1,
    p2.product_name AS product_2,
    p1.category,
    p1.price AS price_1,
    p2.price AS price_2
FROM products p1
JOIN products p2
  ON p1.category = p2.category
  AND p1.product_id < p2.product_id
WHERE p1.price >= 2 * p2.price
   OR p2.price >= 2 * p1.price
```

> **Cost Analysis**
>
> With `products` (20,000 rows), this query scans a small dataset. No indexing is needed for this volume. At production scale, an index on the primary filter column would improve performance.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `rating` in `products` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `products.rating`.)_
- The `rating` column in `products` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `products.rating` and its impact on distributed query performance.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/price_pairs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.