# API Calls With and Without Errors

> Some calls succeed. Some do not. Break it down.

Canonical URL: <https://datadriven.io/problems/api_calls_with_and_without_errors>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The API reliability team is triaging error rates across the platform. For each endpoint, they need a breakdown of how many calls completed cleanly versus how many produced an error message, along with the total call volume per endpoint.

## Worked solution and explanation

### Why this problem exists in real interviews

Built around `api_calls`, this problem requires conditional branching with CASE combined with division-safe NULLIF guards. Interviewers watch whether candidates handle the relationship between `endpoint`, `method`, `status` correctly under grouping pressure.

> **Trick to Solving**
>
> Any rate or ratio problem requires **null-safe division**. If the denominator can be zero, the query crashes or returns NULL silently.
> 
> 1. Identify the numerator and denominator conditions
> 2. Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` for the numerator
> 3. Wrap the denominator in `NULLIF(..., 0)` to prevent division by zero

---

### Break down the requirements

#### Step 1: Group by `endpoint`

`GROUP BY endpoint` produces one output row per distinct value of `endpoint`.

#### Step 2: Compute the ratio with CASE and NULLIF

The numerator uses `SUM(CASE WHEN condition THEN 1 ELSE 0 END)`. Wrapping the denominator in `NULLIF(COUNT(*), 0)` prevents division by zero.

#### Step 3: Round and order

Use `ROUND(..., 4)` for clean decimal output and sort by rate descending.

---

### The solution

**Case-branch for api calls without errors**

```sql
SELECT
    endpoint,
    ROUND(
        1.0 * COUNT(CASE WHEN err_msg IS NOT NULL THEN 1 END)
        / NULLIF(COUNT(*), 0),
        4
    ) AS rate
FROM api_calls
GROUP BY endpoint
ORDER BY rate DESC
```

> **Cost Analysis**
>
> The main table has 200M rows (51 GB). Partitioned on `call_time`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first. Division-by-zero handling is a silent correctness bug; interviewers watch for `NULLIF` or equivalent protection.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- The `err_msg` column in `api_calls` has roughly 92% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- Your CASE expression branches on `endpoint`. What happens if a new category value appears that none of your WHEN clauses match? _(Tests whether the candidate uses a meaningful ELSE branch or lets unmatched rows silently become NULL.)_
- `call_id` in `api_calls` has ~200M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/api_calls_with_and_without_errors)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.