# API Calls With Matching Status

> Same status, same pattern. Coincidence?

Canonical URL: <https://datadriven.io/problems/api_calls_with_matching_status>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

During an incident postmortem, the on-call engineer suspects duplicate request patterns in the API layer. Find pairs of different API calls that share the same HTTP method and status code. For each pair, show both call IDs, the method, and the status, with the highest status codes appearing first. Return only the first 20 results.

## Worked solution and explanation

### Why this problem exists in real interviews

Working with `api_calls`, this problem isolates grouped aggregation. The interviewer expects candidates to articulate why `endpoint`, `method`, `status` matter for correctness before touching the keyboard.

---

### Break down the requirements

#### Step 1: Group by `method`

`GROUP BY` at the correct grain produces one row per group.

#### Step 2: Compute `MAX(status)`

The MAX function computes the max per group.

#### Step 3: Order by the metric

Sort by `max_status` desc for readability.

---

### The solution

**Group-aggregate for api calls matching status**

```sql
SELECT
    method,
    MAX(status) AS max_status
FROM api_calls
GROUP BY method
ORDER BY max_status DESC
```

> **Cost Analysis**
>
> The main table has 100M rows (26 GB). Partitioned on `call_time`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- The `err_msg` column in `api_calls` has roughly 93% NULLs. How does your query handle those rows, and would the result change if NULLs were replaced with zeros? _(Tests whether the candidate understands how NULLs propagate through aggregation functions and whether their WHERE/JOIN conditions implicitly filter them out.)_
- Your GROUP BY aggregates `call_id` from `api_calls`. If two groups have the same aggregate value, how is the output ordered, and is that deterministic? _(Tests awareness that ORDER BY on a non-unique value produces non-deterministic row order without a tiebreaker.)_
- `call_id` in `api_calls` has ~100M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/api_calls_with_matching_status)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.