# Shared Endpoints

> Shared credentials across endpoints.

Canonical URL: <https://datadriven.io/problems/shared_endpoints>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The security team is auditing shared access patterns. Surface all API endpoints that have been accessed by more than one user.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply HAVING for post-aggregation filtering to the `api_calls` table, simulating a real user behavior workflow. Pay attention to the `endpoint` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Aggregate by `endpoint`

`GROUP BY endpoint` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 2: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 3: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

---

### The solution

**Having filter for shared endpoints**

```sql
SELECT endpoint, COUNT(DISTINCT user_id) AS user_count
FROM api_calls
GROUP BY endpoint
HAVING COUNT(DISTINCT user_id) > 1
```

> **Cost Analysis**
>
> With ~150M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; whether you know when DISTINCT is needed and when it masks a logic error.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- If `err_msg` in `api_calls` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `api_calls.err_msg`.)_
- If `api_calls` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `latency`? _(Tests ability to identify performance hotspots related to `api_calls.latency` at scale.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/shared_endpoints)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.