# The Event Breakdown

> Events are piling up by type. The report needs them side by side.

Canonical URL: <https://datadriven.io/problems/pivot_event_counts>

Domain: SQL · Difficulty: medium · Seniority: L6

## Problem

Instead of one row per event, produce one row per user with separate columns counting each event type: clicks, views, purchases.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is conditional aggregation via CASE, applied to the `event_data` table in a purchase behavior context. Getting the `user_id` and `event_type` columns right is where most candidates slip.

> **Trick to Solving**
>
> The word "separate columns" for each category is the pivot signal. In databases without `PIVOT`, use conditional aggregation.
> 
> 1. Spot the pivot language: "one row per user with columns for each type"
> 2. Use `SUM(CASE WHEN event_type = 'X' THEN 1 ELSE 0 END)` for each type
> 3. Group by `user_id`

---

### Break down the requirements

#### Step 1: Group by user_id

`GROUP BY user_id` produces one output row per user.

#### Step 2: Conditionally count each event type

`SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) AS clicks` and similarly for views and purchases.

---

### The solution

**Case pivot for pivot event counts**

```sql
SELECT
    user_id,
    SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) AS clicks,
    SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS views,
    SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchases
FROM event_data
GROUP BY user_id
```

> **Cost Analysis**
>
> At `event_data` (250,000,000 rows), a full table scan is expensive. Partition pruning (if the table is partitioned on the filter column) is the first optimization. A covering index on the `GROUP BY` + filter columns eliminates random I/O. Consider a materialized view for repeated dashboard queries.

> **Interviewers Watch For**
>
> Interviewers watch for clean `CASE WHEN` expressions and check whether you use `SUM` with 1/0 values rather than `COUNT` with `CASE WHEN ... THEN 1 END` (which can miscount NULLs).

> **Common Pitfall**
>
> Using `COUNT(CASE WHEN event_type = 'X' THEN 1 END)` instead of `SUM(CASE WHEN event_type = 'X' THEN 1 ELSE 0 END)` works, but forgetting the `ELSE 0` with `SUM` returns NULL instead of 0 for users with no events of that type.

---

## Common follow-up questions

- What would happen to your result if `event_data.event_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `event_id` and uses DISTINCT or deduplication where needed.)_
- With 12,000,000 distinct values in `event_data.user_id`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `user_id` affects grouping and sort operations.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `event_data` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pivot_event_counts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.