# Auth Endpoint Callers

> Identify users who have called authentication API endpoints

Canonical URL: <https://datadriven.io/problems/auth_endpoint_callers>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

A security review requires tracing which users have hit authentication endpoints. Find every API call whose endpoint path contains 'auth' and show the user's ID, username, email, call ID, and endpoint, ordered by user ID.

## Worked solution and explanation

### What this is really asking

`api_calls` holds 2B rows, `users` holds 6M. The auth filter sits on the giant side, and one user maps to many calls. Output is a row per matching call, not per user.

---

### Break down the requirements

#### Step 1: Filter calls to auth endpoints

`endpoint LIKE '%auth%'` on `api_calls`. The leading wildcard kills any btree index; this is a scan or trigram lookup.

#### Step 2: Join to users

INNER JOIN on `user_id`. Anonymous calls with null `user_id` drop out by design.

#### Step 3: Project and sort

Select the five requested columns, ORDER BY `u.user_id` to cluster each user's calls together.

---

### The solution

**AUTH ENDPOINT CALLERS**

```sql
SELECT
  u.user_id,
  u.username,
  u.email,
  ac.call_id,
  ac.endpoint
FROM users u
INNER JOIN api_calls ac
  ON u.user_id = ac.user_id
WHERE ac.endpoint LIKE '%auth%'
ORDER BY u.user_id
```

> **Cost Analysis**
>
> Push the LIKE filter on `api_calls` first, hash-join with 6M `users` on the build side. `LIKE '%auth%'` is non-sargable; a trigram index turns the 2B scan into a seek.

> **Interviewers Watch For**
>
> Recognizing the small table is the hash build side. Knowing leading `%` defeats btree. Defending INNER over LEFT here.

> **Common Pitfall**
>
> `endpoint LIKE 'auth%'` or `= 'auth'`. The prompt says contains, and real paths look like `/api/v2/auth/login` or `/oauth/token`. Anchoring misses every nested route.

> **The False Start**
>
> First instinct is FROM users LEFT JOIN api_calls with the endpoint filter in WHERE. A non-null predicate on the right side collapses LEFT to INNER, so you pay LEFT's cost for INNER's result. Pivot to INNER JOIN up front.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you accelerate `LIKE '%auth%'` at 2B rows? _(Pre-tag `endpoint` with a category column on write, or build a trigram index.)_
- How do you count anonymous auth hits where `user_id IS NULL`? _(UNION ALL a second query selecting from `api_calls` where `user_id IS NULL`, with NULLs for user columns.)_
- What if you want one row per user instead of per call? _(GROUP BY the user fields, COUNT(call_id), MIN/MAX(call_time).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/auth_endpoint_callers)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.