# Tables With Most DQ Failures

> The tables with the most failures.

Canonical URL: <https://datadriven.io/problems/tables_with_most_dq_failures>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

A downstream compliance team needs to know which tables have the most data quality failures. Count failed checks per table and rank them sorted from most failures to least.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate writing clean, correct queries under time pressure. This pattern appears frequently in mid-level SQL rounds where interviewers want to see structured thinking.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Aggregate by `tbl_name`

`GROUP BY tbl_name` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Rank tables by total failed dq_checks in descending order**

```sql
SELECT tbl_name, COUNT(*) AS fail_count
FROM dq_checks
WHERE passed = 0
GROUP BY tbl_name
ORDER BY fail_count DESC
```

> **Cost Analysis**
>
> With ~500K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- If two tables have the same failure count, how does your ordering behave, and should you add a tiebreaker? _(Tests deterministic ordering; adding tbl_name as a secondary sort prevents non-determinism.)_
- How would the query change if 'failed' is defined as fail_pct > 0 rather than passed = false? _(Tests flexibility in defining the failure predicate and awareness that schema semantics matter.)_
- If the severity column indicates check importance, how would you weight failures by severity in the count? _(Tests ability to move from COUNT to a weighted SUM using CASE on severity values.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/tables_with_most_dq_failures)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.