# Mode of Small Team Costs

> One charge keeps showing up everywhere.

Canonical URL: <https://datadriven.io/problems/mode_of_small_team_costs>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

Among small teams (those with no more than 4 services), find the most common per-service cost amount. Show the single most frequent amount and how many times it appears.

## Worked solution and explanation

### Why this problem exists in real interviews

This focuses on HAVING filter and grouping within cost_allocs, specifically around the team_name column. Interviewers present it in senior-level rounds because the edge cases around NULL values and boundary conditions reveal depth of understanding.

> **Trick to Solving**
>
> Mode requires finding the most frequent value. The trick is combining `COUNT` with `ORDER BY DESC LIMIT 1`.
> 
> 1. Group by the target column and count occurrences
> 2. Order by count descending
> 3. Handle ties if the prompt requires it

---

### Break down the requirements

#### Step 1: Structure the query as multi-step CTEs

This solution uses 2 CTEs to break the logic into readable stages. Each CTE produces an intermediate result that feeds the next.

#### Step 2: Self-join the table

Join `cost_allocs` to itself to compare or pair rows within the same table. Use an inequality condition to avoid duplicate pairs.

#### Step 3: Aggregate with COUNT DISTINCT

Group by the output grain and apply `COUNT DISTINCT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 4: Filter groups with HAVING

The `HAVING` clause filters after aggregation, unlike `WHERE` which filters before. This is necessary when the condition depends on an aggregate result.

#### Step 5: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

---

### The solution

**Two-stage CTE: filter teams then find mode**

```sql
WITH small_teams AS (
    SELECT team_name
    FROM cost_allocs
    GROUP BY team_name
    HAVING COUNT(DISTINCT svc_name) <= 4
),
freq AS (
    SELECT ca.amount, COUNT(*) AS freq
    FROM cost_allocs ca
    JOIN small_teams st ON ca.team_name = st.team_name
    GROUP BY ca.amount
)
SELECT amount AS mode_amount, freq AS occurrences
FROM freq
ORDER BY freq DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 20M rows from `cost_allocs`. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability. Familiarity with statistical functions beyond COUNT/SUM/AVG distinguishes senior candidates.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- If cost_allocs.alloc_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in alloc_id.)_
- What is the difference between filtering in WHERE versus HAVING for this query against cost_allocs? _(Tests whether the candidate understands pre-aggregation vs post-aggregation filtering.)_
- With millions of distinct values in cost_allocs.alloc_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like alloc_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/mode_of_small_team_costs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.