# App Stability by Region

> Some regions crash more than others.

Canonical URL: <https://datadriven.io/problems/app_stability_by_region>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

The mobile team needs crash rates before the next rollout decision. For each combination of tag and date, count app opens, count crashes, and compute the crash rate as crashes divided by opens, rounded to 4 decimal places. Only consider open and crash events.

## Worked solution and explanation

### Why this problem exists in real interviews

Reliability teams probe whether you read the output shape before writing the SELECT. A wide format (`num_opens` and `num_crashes` side by side) tells you to reach for conditional aggregation, not a long-format `GROUP BY (region, event_type)` that the dashboard then has to pivot. The interviewer also wants you to ask about grouping grain on `event_timestamp` before typing.

---

### Break down the requirements

#### Step 1: Filter to the two event types

`WHERE event_type IN ('open', 'crash')` strips the rest before aggregation. Doing it in `WHERE` (not inside the CASE) shrinks what the GROUP BY hashes, especially given the 500M-row scale.

#### Step 2: Pivot via conditional SUM

Two `SUM(CASE WHEN event_type = 'open' THEN 1 ELSE 0 END)` expressions turn rows into wide counts. `COUNT(CASE WHEN ... THEN 1 END)` works too; both ignore the other event type cleanly.

#### Step 3: Group by the bucket keys

`GROUP BY event_timestamp, tags` is what the prompt literally asks for. Pause here, the literal reading is almost certainly wrong (see the comparison below). Read the prompt's word `date` carefully.

#### Step 4: Compute the ratio safely

`crashes * 1.0 / opens` forces float division, then `ROUND(..., 4)`. If a bucket can have zero opens, this divides by zero; wrap the denominator in `NULLIF(..., 0)` to return NULL instead of erroring.

---

### The solution

**CONDITIONAL-AGG PIVOT FOR CRASH RATE**

```sql
SELECT
  event_timestamp,
  tags,
  SUM(CASE WHEN event_type = 'open'  THEN 1 ELSE 0 END) AS num_opens,
  SUM(CASE WHEN event_type = 'crash' THEN 1 ELSE 0 END) AS num_crashes,
  ROUND(
    SUM(CASE WHEN event_type = 'crash' THEN 1 ELSE 0 END) * 1.0
    / SUM(CASE WHEN event_type = 'open' THEN 1 ELSE 0 END),
    4
  ) AS crash_rate
FROM event_data
WHERE event_type IN ('open', 'crash')
GROUP BY event_timestamp, tags
```

> **Cost Analysis**
>
> 500M rows, partitioned by `event_timestamp`. With no date predicate the planner does a full partition scan plus a hash aggregate keyed on (timestamp, tags). Add `WHERE event_timestamp >= DATE '2024-01-01'` (or whatever window the rollout cares about) and you get partition pruning, which is the only thing that makes this query interactive. The `event_type IN (...)` filter happens after partition selection, so it does not save scan cost on its own.

> **Interviewers Watch For**
>
> Do you ask `what grain is event_timestamp?` before writing SQL. If it's microsecond-precision and you group on it raw, an `open` and a `crash` two microseconds apart land in different buckets and every row gets either 1 open / 0 crashes or 0 opens / 1 crash. The output is useless. Senior candidates ask this in the first 20 seconds.

> **Common Pitfall**
>
> Writing `SUM(crashes) / SUM(opens)` without `* 1.0` returns an integer in most engines, so a 17%% crash rate rounds to 0. The expected query uses `* 1.0` for exactly this reason. Same trap: forgetting `NULLIF` and crashing the whole query on the first bucket where `num_opens = 0`.

**Literal: GROUP BY event_timestamp**

Groups at microsecond grain. Each bucket contains roughly one event, so `crash_rate` is 0.0000 or NULL almost everywhere. Matches the prompt's wording but answers no real question.

**Intended: GROUP BY DATE(event_timestamp)**

`GROUP BY DATE(event_timestamp), tags` gives per-day, per-region crash rates, which is what the mobile team can actually read before the rollout decision. Flag the ambiguity and write both, or write this one and explain why.

> **Key Insight**
>
> The query returns the components (opens, crashes, ratio) rather than just the ratio. That's deliberate: a downstream consumer can re-aggregate (sum opens and crashes across regions, then divide) without losing precision. Pre-dividing per bucket and then averaging gives the wrong global rate.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you change the query to handle regions with zero opens in a given day? _(Probes `NULLIF` and the divide-by-zero failure mode that the canonical answer ignores.)_
- If the mobile team wants a global crash rate across all regions, can they just average the per-region `crash_rate` column? _(Tests whether the writer understands ratio-of-sums vs sum-of-ratios. The correct answer: no, they must re-sum opens and crashes.)_
- The team asks for a 7-day rolling crash rate per region. How does the query change? _(Pushes toward window functions over the daily aggregate, or a self-join on a date range. Tests window-frame fluency.)_
- `tags` is actually a delimited string like 'us-east,ios,prod'. How do you group by just the region? _(Tests string parsing and whether the writer pushes back on schema before pivoting on the raw column.)_
- How would you make this query cheap enough to run hourly? _(Targets partition pruning, pre-aggregated daily rollup tables, and whether the writer reaches for an incremental materialized view instead of re-scanning 500M rows.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/app_stability_by_region)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.