# Largest Group

> One group towers above the rest.

Canonical URL: <https://datadriven.io/problems/largest_group>

Domain: SQL · Difficulty: easy · Seniority: L5

## Problem

Which department has the most employees? If two departments are tied, include both. Return the department name and employee count.

## Worked solution and explanation

### Why this problem exists in real interviews

Ties are the whole point, and the easiest wrong answer is `ORDER BY COUNT(*) DESC LIMIT 1`, which silently picks one winner when two departments share the maximum. You need a structure that expresses "all rows equal to the max," not "first row after sorting."

---

### Break down the requirements

#### Step 1: Per-group counts

Group `employees` by `department` and take `COUNT(*)`. One row per department with its headcount.

#### Step 2: Find the maximum once

A scalar subquery wraps the same `GROUP BY` and takes `MAX(cnt)`. Computed once, reusable in the outer filter.

#### Step 3: Filter on equality, not on rank

Use `HAVING COUNT(*) = (scalar max)`. Equality keeps every department whose headcount matches, so any tie surfaces.

---

### The solution

**DEPARTMENTS AT THE MAXIMUM HEADCOUNT**

```sql
SELECT department,
       COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) = (
    SELECT MAX(cnt)
    FROM (
        SELECT COUNT(*) AS cnt
        FROM employees
        GROUP BY department
    )
)
```

> **Cost Analysis**
>
> Two aggregate passes, but `employees` is 80k rows and the grouped result is tiny. The dominant cost is a single hash aggregate; the scalar subquery runs on a handful of rows.

> **Interviewers Watch For**
>
> Whether you noticed the tie requirement. Reaching straight for `LIMIT 1` gives away that you skimmed the schema and missed the sentence about ties. Say "include both" out loud before writing.

> **Common Pitfall**
>
> `HAVING COUNT(*) >= ALL (SELECT COUNT(*) FROM employees GROUP BY department)` works but tangles `NULL` semantics of `ALL`. The scalar `= MAX(...)` form is clearer.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you solve this with a window function instead? _(Tests fluency with `RANK() OVER (ORDER BY COUNT(*) DESC)` as a tie-preserving alternative.)_
- How are `NULL` values in `department` handled here? _(Probes whether you know `GROUP BY` treats `NULL` as its own bucket.)_
- What changes if you want the top three with ties at position three included? _(Tests the jump from `MAX` to `DENSE_RANK` for top-K-with-ties.)_

> **Key Insight**
>
> When a prompt says "include ties," compare each row to an aggregate over the same data. `LIMIT N` is a position filter; ties demand a value filter.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/largest_group)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.