# Auction Lot Summary

> The hammer falls. Who bid the most?

Canonical URL: <https://datadriven.io/problems/auction_lot_summary>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The product team is building a sales leaderboard for in-stock items. For each product currently in stock, show how many transactions reference it, the highest transaction amount, and which user placed the top transaction. Products with no transactions should still appear, showing a count of zero and no top user.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use the `products`, `transactions` tables here to probe row numbering within partitions combined with nested subqueries. The columns `product_name`, `in_stock`, `total_amount` force candidates to reason about the correct grain before writing any aggregation.

---

### Break down the requirements

#### Step 1: Join tables

Join `products` with `transactions` on `product_id` to build the working set.

#### Step 2: Aggregate by a.product_name

`GROUP BY` with `MAX(total_amount)` computes the metric at the correct grain before ranking.

#### Step 3: Rank with ROW_NUMBER

`ROW_NUMBER() OVER (ORDER BY MAX(total_amount) DESC)` assigns ranks. ROW_NUMBER assigns unique sequential ranks.

#### Step 4: Filter to the target rank

Wrap in a CTE and filter `WHERE rnk <= 10` to extract the desired position.

---

### The solution

**Row-number for auction lot summary**

```sql
WITH ranked AS (
    SELECT
        a.product_name,
        MAX(total_amount) AS max_total_amount,
        ROW_NUMBER() OVER (ORDER BY MAX(total_amount) DESC) AS rnk
    FROM products a
JOIN transactions b ON a.product_id = b.product_id
WHERE in_stock = 1
    GROUP BY a.product_name
)
SELECT *
FROM ranked
WHERE rnk <= 10
ORDER BY rnk
```

> **Cost Analysis**
>
> The main table has 20K rows. The window function runs after grouping, so it operates on the reduced result set.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Returning extra columns not asked for, or missing a required column, are both common mistakes that fail automated grading.

---

## Common follow-up questions

- What happens to your results if `product_name` in `products` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Does the database engine materialize your CTE or inline it? How would you check, and when does it matter? _(Tests understanding of CTE materialization behavior, which varies by engine (PostgreSQL materializes by default before v12).)_
- `transaction_id` in `transactions` has ~10M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the business definition of `in_stock` changed mid-quarter (e.g., a status value was renamed), how would you handle historical consistency? _(Tests awareness of slowly changing dimensions and backward-compatible query design.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/auction_lot_summary)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.