# Search Endpoint Status Distribution

> Status codes on the health endpoint.

Canonical URL: <https://datadriven.io/problems/search_endpoint_status_distribution>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The '/api/v1/search' endpoint has been flaky. Surface the distribution of HTTP status codes for calls to that endpoint. Show each status and its count, with statuses in ascending order.

## Worked solution and explanation

### Why this problem exists in real interviews

This search behavior problem uses the `api_calls` table to evaluate grouped COUNT aggregation. Watch how the `endpoint` and `status` columns interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

#### Step 2: Verify the output shape

Confirm the result has the expected columns, ordering, and no duplicate rows. A quick sanity check on row count catches logic errors before submission.

---

### The solution

**Select the target columns to find search endpoint status distribution**

```sql
SELECT status, COUNT(*) AS call_count
FROM api_calls
WHERE endpoint = '/api/v1/search'
GROUP BY status
ORDER BY status ASC
```

> **Cost Analysis**
>
> With ~120M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- If `err_msg` in `api_calls` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `api_calls.err_msg`.)_
- If `api_calls` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `user_id`? _(Tests ability to identify performance hotspots related to `api_calls.user_id` at scale.)_
- How would you modify this query if the business logic required grouping by both `call_id` and `endpoint` instead of just one? _(Tests ability to adapt the query structure to changing requirements.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/search_endpoint_status_distribution)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.