# Top Frameworks by Accuracy

> Top three frameworks by accuracy.

Canonical URL: <https://datadriven.io/problems/top_frameworks_by_accuracy>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Surface the top 3 ML frameworks by average accuracy among production models. Show each framework and its average accuracy, sorted from best to worst.

## Worked solution and explanation

### Why this problem exists in real interviews

Using `ml_models`, this tests filtering to the top rows after aggregation with proper grain management. Strong candidates immediately identify the grouping key and metric column before writing any window function.

---

### Break down the requirements

#### Step 1: Aggregate per framework

`GROUP BY framework` with the appropriate aggregate function produces one summary row per group from the `ml_models` table.

#### Step 2: Rank the results

`ORDER BY` the aggregate descending with `LIMIT` to surface the top entries.

---

### The solution

**Average accuracy per framework for production ml_models, top 3**

```sql
SELECT
    framework,
    SUM(accuracy) AS total_accuracy
FROM ml_models
GROUP BY framework
ORDER BY total_accuracy DESC
LIMIT 10
```

> **Cost Analysis**
>
> The GROUP BY reduces the 3K-row `ml_models` table to the number of distinct `framework` values. A covering index on `(framework, accuracy)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers verify you aggregate before sorting. Sorting raw rows gives per-row values, not group totals. The correct grain is one row per `framework`.

> **Common Pitfall**
>
> Using the wrong aggregate function. `SUM` gives totals, `COUNT` gives volume, `AVG` gives rates. Read the prompt to determine which metric is needed.

---

## Common follow-up questions

- If a framework has only one production model, is a single-value average meaningful for ranking? _(Tests whether a minimum model count threshold makes the ranking more robust.)_
- Should 'production' be an exact match on the status column, or could it include variants like 'production_v2'? _(Tests defensive filtering; exact match vs LIKE 'production%' depends on schema documentation.)_
- How would you add a column showing the number of models backing each framework's average? _(Tests adding COUNT(*) to the same grouped query for transparency.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_frameworks_by_accuracy)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.