# 2FA Confirmation Rate

> Two-factor sent. How many confirmed?

Canonical URL: <https://datadriven.io/problems/2fa_confirmation_rate>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The security team is measuring how reliably users complete two-factor authentication across different client platforms. Only notifications with a delivered status count as valid 2FA pushes; everything else is noise. A push is considered confirmed when the user opened it. For each platform, compute the confirmation rate as a ratio of confirmed pushes to total delivered pushes.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **conditional aggregation** alongside **null-safe division**. Candidates who understand evaluation order handle this cleanly; those who pattern-match tend to stumble.

> **Trick to Solving**
>
> Any rate or ratio problem requires **null-safe division**. If the denominator can be zero, the query crashes or returns NULL silently.
> 
> 1. Identify the numerator and denominator conditions
> 2. Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` for the numerator
> 3. Wrap the denominator in `NULLIF(..., 0)` to prevent division by zero

---

### Break down the requirements

#### Step 1: Filter to qualifying rows

Apply the WHERE clause to isolate the correct subset before computing the ratio.

#### Step 2: Group by `platform`

`GROUP BY platform` produces one output row per distinct value.

#### Step 3: Compute the ratio with CASE and NULLIF

The numerator uses `SUM(CASE WHEN opened THEN 1 ELSE 0 END)`. Wrapping the denominator in `NULLIF(COUNT(*), 0)` prevents division by zero.

#### Step 4: Round and order

Use `ROUND(..., 2)` for clean decimal output and sort by `confirmation_rate` descending.

---

### The solution

**Conditional aggregation for rate computation**

```sql
SELECT
    platform,
    ROUND(
        1.0 * SUM(CASE WHEN opened = 1 THEN 1 ELSE 0 END)
        / NULLIF(COUNT(*), 0),
        2
    ) AS confirmation_rate
FROM push_notifs_2fa
WHERE status = 'delivered'
GROUP BY platform
ORDER BY confirmation_rate DESC
```

> **Cost Analysis**
>
> The main table has 600M rows (115 GB). Partitioned on `sent_at`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Division-by-zero handling is a silent correctness bug; interviewers watch for `NULLIF` or equivalent protection.

> **Common Pitfall**
>
> Two silent traps here. First, **integer division**: `SUM(...) / COUNT(*)` divides two integers, so a 2-of-3 rate truncates to `0` instead of `0.67`. Force floating-point math with `1.0 *` (or `CAST(... AS REAL)`) on the numerator. Second, **NULL in `opened`**: a delivered push whose `opened` value is unknown is not confirmed, yet it still belongs in the denominator. `SUM(CASE WHEN opened = 1 THEN 1 ELSE 0 END)` handles this correctly because NULL fails the `= 1` test and adds 0 to the numerator, while `COUNT(*)` still counts the row.

---

## Common follow-up questions

- What if the data volume increased 10x? _(Tests scaling awareness: would the query plan change, and where is the bottleneck?)_
- How would you handle this if the data arrived as a streaming feed? _(Tests whether the candidate can think beyond batch SQL toward incremental computation.)_
- Could you rewrite this using a CTE instead of a subquery, or vice versa? _(Tests fluency with query refactoring and understanding that CTEs and subqueries are often interchangeable.)_
- How would the results change if you used COUNT(DISTINCT col) instead of COUNT(*)? _(Tests understanding of distinct vs total counting and when deduplication matters.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/2fa_confirmation_rate)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.