# The Last Checkout

> Their last visit. Everything in the bag.

Canonical URL: <https://datadriven.io/problems/last_checkout_product_count>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The retention team wants to understand every user's most recent checkout. For each user in `transactions`, find the most recent date they made a purchase and how many purchase rows they had on that date (a user can show up multiple times on the same date if they bought several items). Return the date, the user id, and the count, sorted from oldest last-purchase date to newest.

## Worked solution and explanation

### Why this problem exists in real interviews

Recency analysis drives churn models and re-engagement campaigns: "when did each user last do the thing, and how much did they do?" The interview tests whether the candidate spots the tie-handling trap: a user can have multiple purchase rows on the same most-recent date, and `ROW_NUMBER()` silently drops all but one of them.

> **Trick to Solving**
>
> "Most recent date per user" is a ranking problem, but the twist is "how many rows on that date." Use `RANK() OVER (PARTITION BY user_id ORDER BY transaction_date DESC)` instead of `ROW_NUMBER()` so all rows on the latest date keep the same rank of 1. Then `WHERE rnk = 1` + `GROUP BY transaction_date, user_id` + `COUNT(*)` finishes the job.

---

### Break down the requirements

#### Step 1: Rank each user's transactions by date

Inside a CTE, `RANK() OVER (PARTITION BY user_id ORDER BY transaction_date DESC)`. `RANK` (not `ROW_NUMBER`) assigns the same rank to rows sharing the same date, which is what we want.

#### Step 2: Keep only each user's most recent date

`WHERE rnk = 1` keeps every row that sits on that user's latest purchase date, including ties.

#### Step 3: Count rows per user on that date

`GROUP BY transaction_date, user_id` with `COUNT(*)` turns multiple rows on the same date into a single per-user count.

#### Step 4: Sort oldest-first

`ORDER BY transaction_date ASC` puts users whose last purchase is furthest in the past at the top, which is the most actionable end of a re-engagement queue.

---

### The solution

**RANK to survive same-date ties, then count**

```sql
WITH ranked AS (
    SELECT user_id,
           transaction_date,
           RANK() OVER (
               PARTITION BY user_id
               ORDER BY transaction_date DESC
           ) AS rnk
    FROM transactions
)
SELECT transaction_date,
       user_id,
       COUNT(*) AS purchase_count
FROM ranked
WHERE rnk = 1
GROUP BY transaction_date, user_id
ORDER BY transaction_date ASC
```

> **Cost Analysis**
>
> `transactions` has 10K rows in the test fixture; the window sort is trivial. At 100M rows, an index on `(user_id, transaction_date DESC)` would let the planner walk the partitions in order and skip the sort.

> **Interviewers Watch For**
>
> The big signal is choosing `RANK` over `ROW_NUMBER`. A user with three purchases on their latest date gets row numbers 1, 2, 3 from `ROW_NUMBER`, and filtering to `= 1` drops two of the three, giving a purchase count of 1 instead of 3. `RANK` keeps all three at rank 1.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` silently undercounts whenever a user has multiple purchases on the same most-recent date. The symptom is subtle because the count is off by the number of ties, not by a factor. The other common mistake is grouping only by `user_id` and losing `transaction_date` from the SELECT.

---

## Common follow-up questions

- How would you return only users whose last purchase was more than 30 days ago? _(Wrap the query as a subquery and filter `transaction_date < DATE('now', '-30 days')` in the outer WHERE. Inside the window everything stays the same.)_
- How would you also return the total amount spent on that last date? _(`SUM(total_amount)` alongside `COUNT(*)` in the final aggregate, and carry `total_amount` through the CTE.)_
- What if "products bought" should count distinct products instead of rows? _(`COUNT(DISTINCT product_id)` instead of `COUNT(*)`. Carry `product_id` through the CTE.)_
- How would you solve this without a window function? _(Correlated subquery: `WHERE t.transaction_date = (SELECT MAX(transaction_date) FROM transactions t2 WHERE t2.user_id = t.user_id)`, then group. Works, but the window version is usually faster and clearer.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/last_checkout_product_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.