# Top Performing Models

> The models that actually perform.

Canonical URL: <https://datadriven.io/problems/top_performing_models>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The ML registry tracks model accuracy. Surface all models with accuracy at 0.90 or above. Return all available fields for each qualifying model, sorted from highest accuracy to lowest.

## Worked solution and explanation

### Why this problem exists in real interviews

The `ml_models` table is the foundation for this per-group ranking via `ROW_NUMBER()` or `DENSE_RANK()` partitioned by a grouping key problem. It tests whether you can compose a CTE or subquery that aggregates before ranking, then filter to the desired slice.

---

### Break down the requirements

#### Step 1: Aggregate per model_id

`GROUP BY model_id` with the appropriate aggregate function produces one summary row per group from the `ml_models` table.

#### Step 2: Rank the results

`ORDER BY` the aggregate descending with `LIMIT` to surface the top entries.

---

### The solution

**Filter ml_models where accuracy >= 0.90 and sort descending**

```sql
SELECT
    model_id,
    SUM(accuracy) AS total_accuracy
FROM ml_models
GROUP BY model_id
ORDER BY total_accuracy DESC
LIMIT 10
```

> **Cost Analysis**
>
> The GROUP BY reduces the 2K-row `ml_models` table to the number of distinct `model_id` values. A covering index on `(model_id, accuracy)` enables an index-only aggregate scan.

> **Interviewers Watch For**
>
> Interviewers verify you aggregate before sorting. Sorting raw rows gives per-row values, not group totals. The correct grain is one row per `model_id`.

> **Common Pitfall**
>
> Using the wrong aggregate function. `SUM` gives totals, `COUNT` gives volume, `AVG` gives rates. Read the prompt to determine which metric is needed.

---

## Common follow-up questions

- If the prompt said 'above 0.90', would you use > or >=? How does that one-row difference matter? _(Tests precision in reading requirements; 'above' typically means >, excluding exactly 0.90.)_
- Is SELECT * acceptable in an interview, or should you enumerate columns explicitly? _(Tests query discipline; SELECT * is fragile if the schema changes, but the prompt says 'all available fields'.)_
- How would you add a row number column showing each model's rank by accuracy? _(Tests adding ROW_NUMBER() OVER (ORDER BY accuracy DESC) to the SELECT list.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_performing_models)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.