# Filtered User Roster

> A clean roster for the all-hands.

Canonical URL: <https://datadriven.io/problems/filtered_user_roster>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The growth team is building a clean user roster for outreach, but the 'admin' and 'system' accounts need to be excluded, as do users with a 'z' anywhere in their email address (a known test-account pattern). Return every remaining user's full profile, alphabetical by username.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge targets grouped aggregation against `users`. Getting the grouping wrong on `username`, `email`, `signup_date` produces silently incorrect counts, which is exactly the trap interviewers set.

---

### Break down the requirements

#### Step 1: Group by `username`

`GROUP BY username` produces one row per distinct value.

#### Step 2: Compute the count

`COUNT(*)` calculates the requested metric per group.

---

### The solution

**Group-aggregate for filtered user roster**

```sql
SELECT username, COUNT(*) AS result
FROM users
GROUP BY username
ORDER BY result DESC
```

> **Cost Analysis**
>
> Single-pass hash aggregate. An index on `username` helps if the table is large.

> **Interviewers Watch For**
>
> The interviewer checks that you use the correct aggregate function (COUNT) based on the prompt.

> **Common Pitfall**
>
> Selecting a column not in GROUP BY and not in an aggregate is a syntax error in strict SQL mode.

---

## Common follow-up questions

- What happens to your results if `username` in `users` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Your GROUP BY aggregates `user_id` from `users`. If two groups have the same aggregate value, how is the output ordered, and is that deterministic? _(Tests awareness that ORDER BY on a non-unique value produces non-deterministic row order without a tiebreaker.)_
- `user_id` in `users` has ~12M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/filtered_user_roster)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.