# Product Transaction Counts

> Show how many transactions each product has, sorted by product ID.

Canonical URL: <https://datadriven.io/problems/product_transaction_counts>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The inventory team needs to see sales velocity per product. Show each product's name and how many transactions reference it, ordered by product ID. Only include products that have been purchased at least once.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this purchase behavior scenario to test self-join against the `products` and `transactions` tables. The focus is on how you handle the `product_id` and `product_name` columns when building the result.

---

### Break down the requirements

#### Step 1: Join products to transactions

`JOIN transactions ON products.product_id = transactions.product_id` naturally excludes products with zero transactions via inner join.

#### Step 2: Group by product and count

`GROUP BY products.product_id, products.product_name` with `COUNT(*)` tallies transactions per product.

#### Step 3: Order by product_id

`ORDER BY products.product_id` returns results in the requested sequence.

---

### The solution

**Join products to transactions to find product transaction counts**

```sql
SELECT p.product_name, COUNT(*) AS transaction_count
FROM products p
JOIN transactions t ON p.product_id = t.product_id
GROUP BY p.product_id, p.product_name
ORDER BY p.product_id
```

> **Cost Analysis**
>
> With `products` (15,000 rows), `transactions` (60,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- The `rating` column in `products` has a 5% null rate. How does your query handle rows where `rating` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `products.rating` and understands how aggregates skip NULL values.)_
- `transactions.transaction_id` has roughly 60,000,000 distinct values. What index strategy would you use to avoid a full scan on `transactions`? _(Tests indexing knowledge specific to the high-cardinality `transaction_id` column in `transactions`.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/product_transaction_counts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.