# Most Efficient API Endpoint

> Best throughput per call.

Canonical URL: <https://datadriven.io/problems/most_efficient_api_endpoint>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

We define endpoint efficiency as the ratio of successful calls (status 200) to average latency. Considering only endpoints with at least 5 total calls, which endpoint has the highest efficiency? Show the endpoint, call count, average latency, and the efficiency ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying api_calls for endpoint data using grouping and top-N selection tests whether you can translate a business requirement into the right column references and filter sequence. It shows up in mid-level screens to verify practical fluency.

> **Trick to Solving**
>
> Share/ratio calculations divide a group total by a grand total. Use a window function for the denominator.
> 
> 1. Aggregate to the group level
> 2. Use `SUM(metric) OVER ()` to get the grand total in each row
> 3. Divide group total by grand total and multiply by 100

---

### Break down the requirements

#### Step 1: Aggregate with COUNT/AVG

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

---

### The solution

**Lowest-average endpoint via ascending sort**

```sql
SELECT endpoint,
    ROUND(AVG(latency), 2) AS avg_latency,
    COUNT(*) AS call_volume
FROM api_calls
GROUP BY endpoint
ORDER BY avg_latency ASC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 250M rows from `api_calls`. The aggregation reduces the row count before any downstream processing, which is the key performance lever. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax. Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- What happens to your result if api_calls.err_msg contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on err_msg.)_
- How would you verify that your aggregation on api_calls.call_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in api_calls.call_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like call_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/most_efficient_api_endpoint)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.