# New Services With Poor Health

> New services, already struggling.

Canonical URL: <https://datadriven.io/problems/new_services_with_poor_health>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

Find services whose earliest health check falls in Q1 2026 and where more than 20% of their checks have a non-healthy status (case-insensitive). Return the service name and the failure ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

Reliability teams use this to probe whether you distinguish per-row conditions (`WHERE`) from per-group conditions (`HAVING`). 'New' means 'service whose earliest check is in Q1', a per-group statement that lives in `HAVING MIN(checked)`. Putting the date filter in `WHERE` would also accept services that have checks scattered across the year, silently inflating the cohort.

---

### Break down the requirements

#### Step 1: Define the cohort

`new services` means the earliest `checked` per `svc_name` lands in Q1. That is `MIN(checked) >= '2026-01-01' AND MIN(checked) < '2026-04-01'`. It goes in `HAVING`, not `WHERE`, because it is a per-group condition.

#### Step 2: Build the ratio

Conditional aggregation: `SUM(CASE WHEN LOWER(status) != 'healthy' THEN 1 ELSE 0 END)` over `COUNT(*)`. Lowercase `status` so `'Healthy'` and `'HEALTHY'` do not slip into the numerator.

#### Step 3: Force float division

Cast the SUM to `REAL` before dividing. `SUM(...) / COUNT(*)` in SQLite or Postgres-integer mode truncates to 0 for every group under 100% failure.

#### Step 4: Filter by threshold

Repeat the ratio expression in `HAVING ... > 0.2`. You cannot reference the SELECT alias `negative_ratio` in `HAVING` in standard SQL.

---

### The solution

**Q1 COHORT, RATIO OVER 20%**

```sql
SELECT
  svc_name,
  CAST(SUM(CASE WHEN LOWER(status) != 'healthy' THEN 1 ELSE 0 END) AS REAL)
    / COUNT(*) AS negative_ratio
FROM svc_health
GROUP BY svc_name
HAVING MIN(checked) >= '2026-01-01'
   AND MIN(checked) <  '2026-04-01'
   AND CAST(SUM(CASE WHEN LOWER(status) != 'healthy' THEN 1 ELSE 0 END) AS REAL)
         / COUNT(*) > 0.2;
```

> **Cost Analysis**
>
> 50M rows, partitioned by `checked`. The cohort filter is on `MIN(checked)` post-aggregation, so the planner cannot prune partitions unless you pre-filter `checked < '2026-04-01'` in `WHERE`. `GROUP BY svc_name` is low-cardinality, so hash aggregation fits in memory. The CASE is per-row but cheap.

> **Interviewers Watch For**
>
> Two questions: does 'new' mean first-ever check in Q1, or any check in Q1? And how should NULL status count? Pick one and say it. The prompt is silent, so flagging the assumption is the answer.

> **Common Pitfall**
>
> Putting `checked BETWEEN '2026-01-01' AND '2026-03-31'` in `WHERE`. That filters which rows enter the aggregation, so a service with checks in both Q1 and Q4 looks Q1-native. The cohort is about the service, not the rows, so the date predicate belongs in `HAVING MIN(checked)`.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you change the query if NULL status should count as unhealthy? _(Probes three-valued logic and whether you reach for `OR status IS NULL` inside the CASE.)_
- What if `new` means first seen in any service registry, not earliest `svc_health` row? _(Tests whether you swap the `HAVING MIN(checked)` for a join to a dim table.)_
- How would you rank the top 10 worst new services without re-scanning svc_health? _(Pushes toward a CTE plus `ORDER BY negative_ratio DESC LIMIT 10`, or a window function.)_
- How does partition pruning change if you add `WHERE checked < '2026-04-01'`? _(Checks whether you understand that pre-filter shrinks scan but does not change correctness here.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/new_services_with_poor_health)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.