# Top API Caller

> One user triggered more API calls than anyone.

Canonical URL: <https://datadriven.io/problems/top_api_caller>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Find the user who triggered the most API calls, showing their user ID and total call count.

## Worked solution and explanation

### What this is really asking

200M rows in api_calls, partitioned by call_time, and the answer is one row: whichever user_id has the largest COUNT(*). Size is the story, not the SQL shape.

---

### Break down the requirements

#### Step 1: Aggregate per user

COUNT(*) grouped by user_id. Every call counts, including failures, since triggered means any row.

#### Step 2: Pick the top one

ORDER BY count DESC LIMIT 1. No tiebreak was specified.

---

### The solution

**TOP API CALLER**

```sql
SELECT
  user_id,
  COUNT(*) AS call_count
FROM api_calls
GROUP BY user_id
ORDER BY call_count DESC
LIMIT 1;
```

> **Cost Analysis**
>
> Full scan of 200M rows is the floor; partitioning by call_time buys nothing without a time predicate. A user_id index does not help, since GROUP BY needs every row. Hash aggregate is the plan.

> **Interviewers Watch For**
>
> Aliasing the count column, knowing GROUP BY drives the aggregate not ORDER BY, and noting that partitioning is wasted because no predicate prunes partitions.

> **Common Pitfall**
>
> Writing COUNT(user_id). If that column were nullable, it silently drops nulls and shifts the answer. COUNT(*) counts rows, which matches triggered.

> **The False Start**
>
> First instinct is MAX(COUNT(*)) in one SELECT, which engines reject as nested aggregates. Pivot to two steps: aggregate per user_id in a CTE, then ORDER BY DESC LIMIT 1 outside.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would the query change for the top caller in the last 7 days? _(Add WHERE call_time >= NOW() - INTERVAL '7 days' to prune partitions and cut the scan.)_
- What if two callers tie for the top spot? _(LIMIT 1 picks one arbitrarily; use RANK() OVER (ORDER BY call_count DESC) and keep rnk = 1.)_
- How would you make this cheap at 200M rows? _(Maintain a per-caller counter via CDC or streaming, so the read is ORDER BY LIMIT 1 against a small summary.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_api_caller)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.