# Top 2 Callers per Endpoint

> Two top callers per endpoint.

Canonical URL: <https://datadriven.io/problems/top_2_callers_per_endpoint>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each endpoint, show the top 2 users by API call volume. If users are tied, include all of them at the same rank. Return the endpoint, user ID, and their rank.

## Worked solution and explanation

### What this is really asking

COUNT(*) per (endpoint, user_id), then a per-endpoint ranking where ties share a rank. DENSE_RANK <= 2 keeps every tied user; ROW_NUMBER drops ties and RANK skips after a tie.

---

### Break down the requirements

#### Step 1: Aggregate calls

GROUP BY endpoint, user_id with COUNT(*). The users table adds no rows; skip it unless the projection needs username.

#### Step 2: Rank inside each endpoint

DENSE_RANK() OVER (PARTITION BY endpoint ORDER BY COUNT(*) DESC). It shares ranks on ties and produces no gaps, matching the spec.

#### Step 3: Filter and sort

Wrap in a subquery, keep rnk <= 2, ORDER BY endpoint, rnk.

---

### The solution

**TOP 2 CALLERS WITH TIES**

```sql
SELECT endpoint, user_id, rnk
FROM (
  SELECT
    endpoint,
    user_id,
    COUNT(*) AS call_count,
    DENSE_RANK() OVER (
      PARTITION BY endpoint
      ORDER BY COUNT(*) DESC
    ) AS rnk
  FROM api_calls
  GROUP BY endpoint, user_id
) ranked
WHERE rnk <= 2
ORDER BY endpoint, rnk;
```

> **Cost Analysis**
>
> GROUP BY collapses 200M rows to endpoints times distinct users before any sort. Skipping the users JOIN avoids a 12M-row probe that contributes nothing to the result.

> **Interviewers Watch For**
>
> Can you defend DENSE_RANK over RANK and ROW_NUMBER? ROW_NUMBER drops ties. RANK skips after a tie, so 1, 1, 3 leaves only the tied pair at rnk <= 2.

> **Common Pitfall**
>
> Putting DENSE_RANK in WHERE alongside COUNT(*). Window functions resolve after WHERE, so the parser rejects it. The subquery exists to defer that filter.

> **The False Start**
>
> First instinct is ORDER BY COUNT(*) DESC LIMIT 2 per endpoint via a correlated subquery. That returns exactly two rows and silently drops third-place ties. Pivot to DENSE_RANK <= 2, the only ranker that honors the tie rule.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you return top 2 with no ties? _(Swap DENSE_RANK for ROW_NUMBER and add user_id as a deterministic tiebreaker in ORDER BY.)_
- What changes if the prompt said top 2 distinct call counts? _(DENSE_RANK already matches that reading, which is why it is the safer default when the spec is fuzzy.)_
- How would you make this run on a daily partition only? _(Add WHERE call_time >= CURRENT_DATE before GROUP BY so partition pruning cuts the scan to one day.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_2_callers_per_endpoint)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.