# Bargain Bin

> Floor prices. Right before the vendor call.

Canonical URL: <https://datadriven.io/problems/bargain_bin>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The procurement team needs floor pricing data for upcoming vendor negotiations. For each product category, show the lowest price, the highest price, and the spread between them. Skip any products with no price on file. Present categories from the widest price spread to the narrowest.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use the `products` table here to probe grouped aggregation. The columns `product_name`, `category`, `price` force candidates to reason about the correct grain before writing any aggregation.

---

### Break down the requirements

#### Step 1: Group by `category`

`GROUP BY` at the correct grain produces one row per group.

#### Step 2: Compute `MAX(price)`

The MAX function computes the max per group.

#### Step 3: Order by the metric

Sort by `max_price` desc for readability.

---

### The solution

**Group-aggregate for bargain bin**

```sql
SELECT
    category,
    MAX(price) AS max_price
FROM products
GROUP BY category
ORDER BY max_price DESC
```

> **Cost Analysis**
>
> The main table has 6K rows. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- The `price` column in `products` has roughly 8% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- Your GROUP BY aggregates `product_id` from `products`. If two groups have the same aggregate value, how is the output ordered, and is that deterministic? _(Tests awareness that ORDER BY on a non-unique value produces non-deterministic row order without a tiebreaker.)_
- The `category` column in `products` has a zipf distribution, meaning a few values dominate. How does that skew affect your query plan and parallelism? _(Tests understanding of data skew: the optimizer may choose a bad plan when histogram statistics are stale.)_
- If `product_id` in `products` contained negative values, would your query still produce correct results? _(Tests whether the candidate validated assumptions about the domain of numeric columns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/bargain_bin)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.