# Products Without Sales

> Listed but never sold.

Canonical URL: <https://datadriven.io/problems/products_without_sales>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Which products have never appeared in a transaction? Return the product ID and product name.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this product analytics scenario to test LEFT JOIN with NULL filtering for anti-join against the `products` and `transactions` tables. The focus is on how you handle columns like `product_id`, `transaction_id`, and `product_name` when building the result. It also requires self-join.

---

### Break down the requirements

#### Step 1: LEFT JOIN products to transactions

`FROM products LEFT JOIN transactions ON products.product_id = transactions.product_id` keeps all products.

#### Step 2: Filter for no matching transaction

`WHERE transactions.transaction_id IS NULL` isolates unsold products.

#### Step 3: Return product_id and product_name

Select the two identifying columns from the products table.

---

### The solution

**Left-join null filter for products without sales**

```sql
SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN transactions t ON p.product_id = t.product_id
WHERE t.transaction_id IS NULL
```

> **Cost Analysis**
>
> With `products` (15,000 rows), `transactions` (40,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `rating` column in `products` has a 5% null rate. How does your query handle rows where `rating` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `products.rating` and understands how aggregates skip NULL values.)_
- `transactions.transaction_id` has roughly 40,000,000 distinct values. What index strategy would you use to avoid a full scan on `transactions`? _(Tests indexing knowledge specific to the high-cardinality `transaction_id` column in `transactions`.)_
- Could you rewrite the LEFT JOIN anti-pattern using NOT EXISTS instead, and what performance difference might you see on `products`? _(Tests knowledge of anti-join alternatives and their optimizer behavior.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/products_without_sales)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.