# YoY Signup Growth Rate

> This year versus last year. Growing or shrinking?

Canonical URL: <https://datadriven.io/problems/yoy_signup_growth_rate>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

For each year, count signups and compute the year-over-year growth rate as a percentage compared to the prior year. Round to the nearest whole number and return results by year ascending.

## Worked solution and explanation

### What this is really asking

Twenty million rows in `users`, one row per signup. Bucket `signup_date` by year, count, then compare each bucket to the prior. The hard part is integer math on the delta, not the window.

---

### Break down the requirements

#### Step 1: Aggregate to year grain

Extract the year with `strftime('%Y', signup_date)` and count signups. Group by that derived column.

#### Step 2: Pull prior year inline

`LAG(signups) OVER (ORDER BY signup_year)` on the yearly aggregate. First year gets NULL and that propagates correctly.

#### Step 3: Force float division

Cast the delta to REAL, divide, multiply by 100, round. Without the cast, integer division floors and most years collapse to 0.

---

### The solution

**YEAR OVER YEAR SIGNUPS**

```sql
WITH yearly_signups AS (
  SELECT strftime('%Y', signup_date) AS signup_year,
         COUNT(DISTINCT user_id) AS signups
  FROM users
  GROUP BY signup_year
)
SELECT signup_year,
       signups,
       LAG(signups) OVER (ORDER BY signup_year) AS prev_year_signups,
       ROUND(
         CAST((signups - LAG(signups) OVER (ORDER BY signup_year)) AS REAL)
         / CAST(LAG(signups) OVER (ORDER BY signup_year) AS REAL)
         * 100
       ) AS yoy_growth_pct
FROM yearly_signups
ORDER BY signup_year
```

> **Cost Analysis**
>
> One full scan of 20M rows to aggregate, then LAG over a dozen yearly rows. An index on `signup_date` does not help a full-table aggregate; memory stays tiny since the CTE collapses to one row per year.

> **Interviewers Watch For**
>
> Whether you cast before dividing, how you handle the NULL first year, and whether you reach for `LAG` over a self-join.

> **Common Pitfall**
>
> Skipping the CAST returns 0 for every year that did not at least double. Integer division truncates toward zero, so a 47% jump rounds to 0%.

> **The False Start**
>
> First instinct is a self-join: `users y1 JOIN users y2 ON y2.year = y1.year - 1`. Logically fine, but you join 20M rows to themselves before aggregating. Pivot to aggregate first in a CTE, then `LAG` over the dozen yearly rows.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you filter to only years with negative growth? _(Wrap in an outer SELECT; `LAG` cannot live in WHERE.)_
- What if growth is versus a rolling 3-year average instead? _(Swap `LAG` for `AVG(signups) OVER (ORDER BY signup_year ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)`.)_
- How does the answer shift with soft-deletes via `account_status`? _(Decide if a deleted account counts. If not, filter inside the CTE before aggregating.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/yoy_signup_growth_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.