# Monthly Transaction Summary

> A monthly engagement summary.

Canonical URL: <https://datadriven.io/problems/monthly_transaction_summary>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

For each month in 2026, count unique users and total transactions, but only include transactions where the amount is at least $5. Return the month, unique user count, and total transactions.

## Worked solution and explanation

### Why this problem exists in real interviews

Extracting insights from transactions.quantity grouped by total_amount via grouping and date extraction is the central task. It is used in mid-level screens to test whether you pick the right aggregation function and partition boundary on the first attempt.  Getting the column references right on the first try is what interviewers watch for.

> **Trick to Solving**
>
> Read the prompt carefully for implicit constraints. The phrase structure hints at the grain of the output: what each row represents.
> 
> 1. Identify the output grain from the prompt (one row per what?)
> 2. Work backward from the desired output columns
> 3. Build the query inside-out: innermost subquery first, then layer on filters and aggregates

---

### Break down the requirements

#### Step 1: Filter to the target rows

Apply the date filter using `STRFTIME` to extract and compare the relevant time component. This restricts rows before aggregation.

#### Step 2: Aggregate with COUNT DISTINCT

Group by the output grain and apply `COUNT DISTINCT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Filtered multi-aggregate with date constraint**

```sql
SELECT STRFTIME('%Y-%m', transaction_date) AS month,
    COUNT(DISTINCT user_id) AS unique_users,
    COUNT(*) AS total_transactions
FROM transactions
WHERE total_amount >= 5
    AND STRFTIME('%Y', transaction_date) = '2026'
GROUP BY STRFTIME('%Y-%m', transaction_date)
ORDER BY month
```

> **Cost Analysis**
>
> The query scans 100M rows from `transactions`. The aggregation reduces the row count before any downstream processing, which is the key performance lever.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- If transactions.transaction_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in transaction_id.)_
- How would you verify that your aggregation on transactions.transaction_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in transactions.transaction_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like transaction_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/monthly_transaction_summary)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.