# The Rules That Broke

> Constraints exist for a reason. Something violated them anyway.

Canonical URL: <https://datadriven.io/problems/failed_constraint_checks_count>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

A downstream consumer flagged bad data from the quality-checks pipeline. Surface how many data quality checks with a constraint-related rule resulted in a failure.

## Worked solution and explanation

### Why this problem exists in real interviews

The query is one line. The probe is whether you treat `rule LIKE '%constraint%'` and `passed = 0` as obvious or as ambiguous. They want to see if you pause to ask what counts as a constraint rule, whether `passed` is a boolean or tri-state, and whether the downstream consumer cares about a window. A staff DE forces those questions before writing.

---

### Break down the requirements

#### Step 1: Pin the rule taxonomy

Ask out loud: is `rule` a free-text label or a controlled vocabulary? You're matching `%constraint%` because the prompt says constraint-related. Confirm there's no `rule_type` enum you should be using instead.

#### Step 2: Pin the failure flag

`passed` is on `dq_checks`. Treat 0 as fail and 1 as pass, but ask if NULL means skipped or errored. `passed = 0` filters NULLs out by three-valued logic; that's correct here, but say so.

#### Step 3: Pin the window

The consumer flagged bad data now, not all-time. Ask whether they want a `run_at` window (last 24h, current run batch). Write the unbounded version first, then add `AND run_at >= ?` once they answer.

#### Step 4: Count, don't list

Single scalar with `COUNT(*)` and an alias. No GROUP BY, no DISTINCT on `check_id` unless they tell you the same check can fail twice in scope.

---

### The solution

**CONSTRAINT FAILURE COUNT**

```sql
SELECT COUNT(*) AS fail_count
FROM dq_checks
WHERE rule LIKE '%constraint%'
  AND passed = 0
```

> **Cost Analysis**
>
> 500k rows with a leading-wildcard `LIKE` defeats any btree on `rule`. Expect a full scan. If this runs hourly, push for a normalized `rule_type` column or a generated column with a trigram or expression index. For a one-off audit, scan cost is fine.

> **Interviewers Watch For**
>
> Before typing, ask: 'Is the consumer asking about a specific run window, or lifetime?' and 'Does `passed` have a NULL state for skipped checks?' Skipping these and writing the one-liner immediately reads as junior, even when the SQL is right.

> **Common Pitfall**
>
> Writing `passed != 1` instead of `passed = 0`. Under three-valued logic, NULL `passed` rows get dropped either way, but `!=` invites a debate about whether NULL means failure. Use the positive predicate so the semantics are explicit.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you break this out by `tbl_name` to show which tables fail constraint checks most often? _(Probes whether you reach for `GROUP BY tbl_name` and add `ORDER BY fail_count DESC LIMIT N`.)_
- What if the same `check_id` can produce multiple rows per day and you only want one failure per check per day? _(Probes deduplication via `COUNT(DISTINCT check_id)` or a window-based pick on `run_at`.)_
- How would you compute a failure rate instead of a raw count? _(Probes conditional aggregation: `SUM(CASE WHEN passed = 0 THEN 1 ELSE 0 END) * 1.0 / COUNT(*)` scoped to constraint rules.)_
- If `severity` exists, would you weight failures by it? _(Probes whether you push back and ask the consumer what 'how many' means when not all failures are equal.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/failed_constraint_checks_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.