# The Latest Transaction Per Product

> Every product has a last sale. When was it?

Canonical URL: <https://datadriven.io/problems/the_latest_transaction_per_product>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

The catalog dashboard needs each product paired with its most recent sale. For every product, show the product details alongside the most recent transaction amount and date. Some products have hundreds of transactions, so only the latest should appear.

## Worked solution and explanation

### Why this problem exists in real interviews

"Most recent row per group" is the most common analytics pattern in SQL interviews because it shows up everywhere in product work: latest order per customer, latest event per session, latest balance per account. Interviewers use it to see whether you reach for `ROW_NUMBER()` partitioned by the group key, or fall back to a correlated `MAX()` subquery that breaks on ties.

> **Trick to Solving**
>
> "Each product's most recent sale" is the latest-per-group signal. Number each product's transactions with `ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY transaction_date DESC)`, keep rows where the number is 1, then attach product metadata with a join. Unsold products have zero rows in `transactions`, so an inner join drops them correctly.

---

### Break down the requirements

#### Step 1: Rank transactions inside each product

Inside a CTE, compute `ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY transaction_date DESC)`. Row number 1 is that product's latest sale.

#### Step 2: Keep only the latest row

Filter the CTE to `rn = 1`. Exactly one row per product that has at least one transaction.

#### Step 3: Attach product metadata via inner join

Inner-join back to `products` on `product_id` to pick up `product_name` and `category`. Products that have never sold have no row in the ranked CTE and are excluded, which matches the "out of scope" rule in the prompt.

#### Step 4: Return the four display columns

Return `product_name`, `category`, the latest `total_amount`, and the latest `transaction_date`.

---

### The solution

**Window function for latest-per-group**

```sql
WITH ranked AS (
    SELECT product_id,
           total_amount,
           transaction_date,
           ROW_NUMBER() OVER (
               PARTITION BY product_id
               ORDER BY transaction_date DESC
           ) AS rn
    FROM transactions
)
SELECT p.product_name,
       p.category,
       r.total_amount     AS latest_sale_amount,
       r.transaction_date AS last_sale_date
FROM products p
JOIN ranked r ON r.product_id = p.product_id
WHERE r.rn = 1
```

> **Cost Analysis**
>
> The window function sorts each partition. A covering index on `transactions(product_id, transaction_date DESC)` lets the planner walk the index in order and skip a global sort. At 80M transactions across 20K products, partitions average ~4K rows; the window pass is the dominant cost.

> **Interviewers Watch For**
>
> Interviewers compare three approaches: the window function (clean, fastest on modern planners), a correlated `MAX(transaction_date)` subquery (produces duplicates on ties), and a self-join to the max date (also duplicate-prone). Strong candidates pick the window function and name the duplicate risk of the alternatives. They also confirm that unsold products should drop out before writing the join.

> **Common Pitfall**
>
> Using `WHERE transaction_date = (SELECT MAX(transaction_date) ...)` joined back returns multiple rows per product if two sales share the same timestamp. `ROW_NUMBER()` guarantees exactly one row per group, and you can add a secondary ORDER BY column to make ties deterministic.

---

## Common follow-up questions

- What if the dashboard also wanted the two most recent sales per product? _(Change the filter to `rn <= 2`. The rest of the query is identical.)_
- How would you break ties when two sales share the same transaction_date? _(Add a deterministic tiebreaker like `transaction_id DESC` to the ORDER BY inside the window.)_
- How would you include products that have never sold with NULL sale columns? _(Switch to `LEFT JOIN ranked` against `products` and filter `rn = 1 OR rn IS NULL`. Unsold products appear with NULL sale columns. The prompt here excludes them, but this is the common follow-up.)_
- How would the plan change at 1 billion transactions and 1 million products? _(Partition `transactions` by month plus a partial index on the most recent partition keeps the window sort bounded. You might also materialize a per-product latest-sale table and incrementally update it.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_latest_transaction_per_product)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.