# Top Endpoint by Power Users

> Power users have a favorite endpoint.

Canonical URL: <https://datadriven.io/problems/top_endpoint_by_power_users>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

A power user is someone whose calls to standard methods (GET, POST, PUT, DELETE) make up at least 50% of their total calls. Find the endpoint where the most power users are active. Return the endpoint and the power user count.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you can hold two cohort grains in your head. 'Power user' is defined per (user, endpoint), not per user. Aggregating to user level first would lose endpoint specificity. The interviewer also wants to hear you ask whether the ratio belongs in `WHERE` (it doesn't, aggregates are invisible there) before you reach for `HAVING`.

---

### Break down the requirements

#### Step 1: Define power user per endpoint

Per (user_id, endpoint), the ratio of method IN ('GET','POST','PUT','DELETE') to total calls must be at least 0.5. Cohort is a row, not a user.

#### Step 2: Use a conditional SUM, not COUNT(WHERE...)

SUM(CASE WHEN method IN (...) THEN 1 ELSE 0 END) over COUNT(*). Both are aggregates, so the ratio belongs in HAVING, never WHERE.

#### Step 3: CAST to REAL

Integer SUM over integer COUNT silently floors to 0 in SQLite, MySQL, and older Postgres. CAST one side to REAL before dividing.

#### Step 4: Outer aggregation

From the cohort CTE, GROUP BY endpoint and COUNT(DISTINCT user_id). ORDER BY that count DESC, LIMIT 1.

---

### The solution

**POWER USER COHORT THEN ENDPOINT RANK**

```sql
WITH user_method_ratio AS (
  SELECT
    user_id,
    endpoint,
    CAST(SUM(CASE WHEN method IN ('GET', 'POST', 'PUT', 'DELETE') THEN 1 ELSE 0 END) AS REAL) / COUNT(*) AS method_ratio
  FROM api_calls
  GROUP BY user_id, endpoint
  HAVING CAST(SUM(CASE WHEN method IN ('GET', 'POST', 'PUT', 'DELETE') THEN 1 ELSE 0 END) AS REAL) / COUNT(*) >= 0.5
)
SELECT
  endpoint,
  COUNT(DISTINCT user_id) AS power_user_count
FROM user_method_ratio
GROUP BY endpoint
ORDER BY power_user_count DESC
LIMIT 1
```

> **Cost Analysis**
>
> api_calls is 300M rows partitioned by call_time. The inner GROUP BY (user_id, endpoint) is the expensive step: very high cardinality, full shuffle. The outer GROUP BY endpoint reduces to a tiny set. If asked to limit scope, suggest a call_time predicate to prune partitions before the cohort scan.

> **Interviewers Watch For**
>
> Before writing, ask: power user per endpoint, or globally across all endpoints? The prompt implies per endpoint (one user can be a power user on /search and not on /upload). Confirming that out loud earns the point.

> **Common Pitfall**
>
> Two bugs cluster here. First, putting the ratio in WHERE: aggregates aren't visible there, so the query fails to parse. Second, forgetting CAST: `SUM(...) / COUNT(*)` integer-divides to 0 for every row except those at 100% standard methods, silently emptying the cohort.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you return the top 3 endpoints instead of just the top one? _(Swap LIMIT 1 for LIMIT 3, or wrap in DENSE_RANK() OVER (ORDER BY power_user_count DESC) to handle ties cleanly.)_
- What changes if power user is defined globally per user_id instead of per (user_id, endpoint)? _(Probes whether you can refactor the cohort grain. The inner CTE becomes GROUP BY user_id only, then you join back to api_calls to count distinct users per endpoint.)_
- How would you make this run on a 30 day rolling window? _(Tests partition pruning awareness. Add WHERE call_time >= CURRENT_DATE - INTERVAL '30 day' before the GROUP BY, and confirm the optimizer prunes by call_time.)_
- If method had nulls, how would the ratio behave? _(CASE WHEN method IN (...) returns 0 for null, so nulls count against the user's ratio. Decide whether to exclude them via WHERE method IS NOT NULL up front.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_endpoint_by_power_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.