# Common Age Buckets

> Duplicate records hiding in the users table.

Canonical URL: <https://datadriven.io/problems/common_age_buckets>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The growth team is profiling the user base by age demographics. Which age buckets have more than one user? Show each qualifying bucket and its user count, sorted from largest to smallest.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a basic GROUP BY with HAVING question. It tests the fundamental ability to aggregate, filter on counts, and sort.

---

### Break down the requirements

#### Step 1: Group by age bucket

`GROUP BY age_bucket` produces one row per demographic segment.

#### Step 2: Filter and sort

`HAVING COUNT(*) > 1` keeps buckets with multiple users. `ORDER BY COUNT(*) DESC` ranks largest first.

---

### The solution

**Group, filter, and sort by count**

```sql
SELECT age_bucket, COUNT(*) AS user_count
FROM users
GROUP BY age_bucket
HAVING COUNT(*) > 1
ORDER BY user_count DESC
```

> **Cost Analysis**
>
> Single scan of 10M rows. Age buckets are low cardinality (under 20 groups), so aggregation is trivially cheap.

> **Common Pitfall**
>
> If `age_bucket` contains NULLs, they form their own group. Depending on requirements, you may need `WHERE age_bucket IS NOT NULL`.

---

## Common follow-up questions

- What if age_bucket is NULL for some users? _(NULLs are grouped together. Tests whether you need to exclude or handle them.)_
- How would you show the percentage of total users in each bucket? _(Tests window function: 100.0 * COUNT(*) / SUM(COUNT(*)) OVER ().)_
- What if the threshold was 'more than the average bucket size'? _(Tests subquery-based threshold: HAVING COUNT(*) > (SELECT AVG(cnt) FROM ...).)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/common_age_buckets)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.