# Service User Growth Rate

> User growth, service by service.

Canonical URL: <https://datadriven.io/problems/service_user_growth_rate>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

The SRE team is measuring service adoption momentum heading into the new year. For each service, calculate the growth rate of health check activity between December 2025 and January 2026, defined as January's check count divided by December's.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a reliability engineering domain, this question centers on conditional aggregation via CASE over the `svc_health` table. The tricky part is handling the `svc_name` column correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Filter to the target set

The `IN` list restricts the query to only the specified values, avoiding a full-table scan of irrelevant rows.

#### Step 2: Use conditional aggregation with CASE

A `CASE` expression inside the aggregate function splits rows into buckets without multiple passes over the data. Each condition maps to one output column.

#### Step 3: Aggregate by `svc_name`

`GROUP BY svc_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

---

### The solution

**Case pivot for service user growth rate**

```sql
SELECT svc_name, CAST(SUM(CASE WHEN strftime('%Y-%m', checked) = '2026-01' THEN 1 ELSE 0 END) AS DOUBLE) / CAST(NULLIF(SUM(CASE WHEN strftime('%Y-%m', checked) = '2025-12' THEN 1 ELSE 0 END), 0) AS DOUBLE) AS growth_rate
FROM svc_health
WHERE strftime('%Y-%m', checked) IN ('2025-12', '2026-01')
GROUP BY svc_name
```

> **Cost Analysis**
>
> With ~20M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you can pivot data with conditional aggregation in a single pass instead of multiple queries; how you handle date arithmetic and whether you account for edge cases like month boundaries.

> **Common Pitfall**
>
> Placing the CASE expression outside the aggregate (e.g., `CASE WHEN ... THEN SUM(x)`) changes the semantics entirely. The CASE must go inside the aggregate.

---

## Common follow-up questions

- The `latency` column in `svc_health` has a 1% null rate. How does your query handle rows where `latency` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `svc_health.latency` and understands how aggregates skip NULL values.)_
- If `svc_health` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `check_id`? _(Tests ability to identify performance hotspots related to `svc_health.check_id` at scale.)_
- If `svc_health` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/service_user_growth_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.