# Transaction Overview

> The executive snapshot. Users, products, revenue.

Canonical URL: <https://datadriven.io/problems/transaction_overview>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The finance team needs two numbers for the board slide: how many users have made at least one purchase, and the average transaction amount across all transactions.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests the distinction between `COUNT(DISTINCT ...)` and `AVG(...)` in a single query. Interviewers check whether you can compute two different aggregates over different granularities in one pass.

---

### Break down the requirements

#### Step 1: Count distinct purchasing users

`COUNT(DISTINCT user_id)` counts how many unique users have at least one transaction.

#### Step 2: Compute average transaction amount

`AVG(total_amount)` computes the mean across all 80M transactions.

---

### The solution

**Two aggregates in a single scan**

```sql
SELECT
    COUNT(DISTINCT user_id) AS unique_purchasers,
    AVG(total_amount) AS avg_transaction_amount
FROM transactions
```

> **Cost Analysis**
>
> One full scan of 80M rows. `COUNT(DISTINCT user_id)` requires tracking 4M unique values in a hash set. This is the memory-intensive part. `AVG` is a streaming computation with constant memory.

> **Interviewers Watch For**
>
> Candidates who write two separate queries or use a subquery when a single SELECT suffices. Combining aggregates in one query is more efficient and readable.

> **Common Pitfall**
>
> Using `COUNT(user_id)` instead of `COUNT(DISTINCT user_id)`. Without DISTINCT, you get the total number of transactions (80M), not the number of unique purchasers (4M).

---

## Common follow-up questions

- What if you needed the median instead of the average? _(Tests knowledge of PERCENTILE_CONT or a manual approach since median is not a simple aggregate.)_
- How would you add the total transaction count alongside these metrics? _(Add COUNT(*) to the same SELECT.)_
- What if some total_amount values were NULL? _(AVG ignores NULLs; COUNT(DISTINCT user_id) still counts users with NULL amounts.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/transaction_overview)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.