# DQ Fail Rate by Table

> Pass rates, table by table.

Canonical URL: <https://datadriven.io/problems/dq_fail_rate_by_table>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The data engineering team needs to see which tables have the worst data quality among those with a recorded severity. Show the average fail percentage per table, from the cleanest to the dirtiest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests filtered GROUP BY with AVG and ordering. The requirement to exclude rows without severity and sort by fail percentage probes NULL handling in filters.

---

### Break down the requirements

#### Step 1: Filter to rows with severity

`WHERE severity IS NOT NULL` removes rows without a recorded severity.

#### Step 2: Average fail percentage per table

`GROUP BY tbl_name` with `AVG(fail_pct)`.

#### Step 3: Sort from cleanest to dirtiest

`ORDER BY AVG(fail_pct) ASC` lists best data quality first.

---

### The solution

**Filtered average with quality ranking**

```sql
SELECT tbl_name, AVG(fail_pct) AS avg_fail_pct
FROM dq_checks
WHERE severity IS NOT NULL
GROUP BY tbl_name
ORDER BY avg_fail_pct ASC
```

> **Cost Analysis**
>
> Scan of 600K rows. GROUP BY reduces to the number of monitored tables. Trivially fast.

> **Interviewers Watch For**
>
> Whether the candidate remembers the NULL filter. The prompt says "among those with a recorded severity," which is a data quality filter before aggregation.

> **Common Pitfall**
>
> Sorting DESC instead of ASC would show the dirtiest first, contradicting "cleanest to dirtiest." Read the prompt carefully for sort direction.

---

## Common follow-up questions

- How would you also show the worst column per table? _(Tests subquery or window function to find the column with the highest fail_pct per table.)_
- What if you wanted to weight by the number of checks? _(Tests weighted average: SUM(fail_pct * count) / SUM(count) or similar.)_
- How would you flag tables that exceed a 5% fail threshold? _(Add HAVING AVG(fail_pct) > 5 or a CASE column.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/dq_fail_rate_by_table)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.