# Products With Strong Unit Price

> Budget-friendly and high-performing.

Canonical URL: <https://datadriven.io/problems/products_with_strong_unit_price>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Find products with at least 1 purchase and a unit-weighted average price (total amount divided by quantity) of at least $100. Pull the product name from the products table. Return the product ID, product name, and average unit price.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a purchase behavior domain, this question centers on self-join over the `transactions` and `products` tables. The tricky part is handling columns like `product_id`, `product_name`, and `quantity` correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Join transactions to products

`JOIN products ON transactions.product_id = products.product_id` brings in `product_name`.

#### Step 2: Group by product and compute unit price

`GROUP BY product_id, product_name` with `SUM(total_amount) / SUM(quantity)` computes the weighted average unit price.

#### Step 3: Filter with HAVING

`HAVING SUM(quantity) >= 1 AND SUM(total_amount) / SUM(quantity) >= 100` applies both thresholds.

---

### The solution

**Having filter for products with strong unit price**

```sql
SELECT
    t.product_id,
    p.product_name,
    SUM(t.total_amount) / SUM(t.quantity) AS avg_unit_price
FROM transactions t
JOIN products p ON t.product_id = p.product_id
GROUP BY t.product_id, p.product_name
HAVING SUM(t.quantity) >= 1
   AND SUM(t.total_amount) / SUM(t.quantity) >= 100
```

> **Cost Analysis**
>
> With `transactions` (80,000,000 rows), `products` (30,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `rating` column in `products` has a 5% null rate. How does your query handle rows where `rating` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `products.rating` and understands how aggregates skip NULL values.)_
- `transactions.user_id` has roughly 4,000,000 distinct values. What index strategy would you use to avoid a full scan on `transactions`? _(Tests indexing knowledge specific to the high-cardinality `user_id` column in `transactions`.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/products_with_strong_unit_price)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.