# Senior to Junior Ratio

> The ratio tells you a lot about the department.

Canonical URL: <https://datadriven.io/problems/senior_to_junior_ratio>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

For each department, compute the ratio of senior employees (fiscal year before 2021) to junior employees (fiscal year 2026 or later). Show the department, senior count, junior count, and the ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

This analytics problem uses the `employee_metrics` table to evaluate conditional aggregation via CASE. Watch how the `department` column interact in the grouping and filtering logic.

> **Trick to Solving**
>
> When the prompt asks for multiple metrics split by a condition (e.g., resolved vs. unresolved), conditional aggregation avoids multiple passes.
> 
> 1. Spot the split: two or more categories in one output row
> 2. Use `SUM(CASE WHEN condition THEN 1 ELSE 0 END)` for each bucket
> 3. Group by the common dimension

---

### Break down the requirements

#### Step 1: Use conditional aggregation with CASE

A `CASE` expression inside the aggregate function splits rows into buckets without multiple passes over the data. Each condition maps to one output column.

#### Step 2: Aggregate by `department`

`GROUP BY department` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

---

### The solution

**Case pivot for senior to junior ratio**

```sql
SELECT department, SUM(CASE WHEN fiscal_year < 2021 THEN 1 ELSE 0 END) AS senior_count, SUM(CASE WHEN fiscal_year >= 2026 THEN 1 ELSE 0 END) AS junior_count, CAST(SUM(CASE WHEN fiscal_year < 2021 THEN 1 ELSE 0 END) AS DOUBLE) / MAX(SUM(CASE WHEN fiscal_year >= 2026 THEN 1 ELSE 0 END), 1) AS senior_junior_ratio
FROM employee_metrics
GROUP BY department
```

> **Cost Analysis**
>
> With ~12,000 rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you can pivot data with conditional aggregation in a single pass instead of multiple queries.

> **Common Pitfall**
>
> Placing the CASE expression outside the aggregate (e.g., `CASE WHEN ... THEN SUM(x)`) changes the semantics entirely. The CASE must go inside the aggregate.

---

## Common follow-up questions

- What would happen to your result if `employee_metrics.metric_value` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `metric_value` and uses DISTINCT or deduplication where needed.)_
- The `metric_value` column in `employee_metrics` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `employee_metrics.metric_value` and its impact on distributed query performance.)_
- If `employee_metrics` contained late-arriving rows that were inserted after your query ran, how would you design an incremental update instead of re-aggregating? _(Tests understanding of incremental aggregation patterns.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/senior_to_junior_ratio)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.