# Best Accuracy to Training Time Ratio

> Fast to train. Accurate too. Which model?

Canonical URL: <https://datadriven.io/problems/best_accuracy_to_training_time_ratio>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The ML team is looking for models that punch above their weight: high accuracy relative to how recently they were trained, using the training timestamp's epoch value as the denominator. Which single model has the highest accuracy-to-training-time ratio? Exclude models with no accuracy score and show the model name, accuracy, training timestamp, and the ratio.

## Worked solution and explanation

### Why this problem exists in real interviews

Tests whether you can compute a derived expression in `SELECT`, then rank on it. The wrinkle is the denominator: `strftime('%s', train_at)` converts a timestamp to epoch seconds. The interviewer wants to see you flag the unit-of-measure ambiguity before you write a line of SQL.

---

### Break down the requirements

#### Step 1: Filter scoreable models

`WHERE accuracy IS NOT NULL`. NULL in the numerator yields NULL, sorts last under `ORDER BY DESC`, but the prompt says exclude.

#### Step 2: Derive the ratio

`CAST(accuracy AS REAL) / CAST(strftime('%s', train_at) AS REAL)`. Both casts are non-negotiable. Without them you get integer division to 0.

#### Step 3: Project the four columns

`mdl_name, accuracy, train_at, ratio`. Alias the derived column so `ORDER BY ratio DESC` is unambiguous.

#### Step 4: Top one

`ORDER BY ratio DESC LIMIT 1`. No window function needed, no tie-break in the prompt.

---

### The solution

**RATIO RANK**

```sql
SELECT mdl_name, accuracy, train_at,
       CAST(accuracy AS REAL) / CAST(strftime('%s', train_at) AS REAL) AS ratio
FROM ml_models
WHERE accuracy IS NOT NULL
ORDER BY ratio DESC
LIMIT 1
```

> **Cost Analysis**
>
> 3,000 rows in `ml_models`. Full scan, in-memory sort on the derived column. Sub-millisecond. No index helps here because `ratio` is computed per row. Don't propose one.

> **Interviewers Watch For**
>
> Ask out loud: is `train_at` the start time, end time, or a duration? Using epoch seconds as a denominator only makes sense if you accept it as a proxy. Name the assumption, then write the query the prompt asks for.

> **Common Pitfall**
>
> Skipping the `CAST`. `strftime('%s', train_at)` returns TEXT, so dividing raw triggers implicit coercion that can produce `0`, `NULL`, or surprising values depending on the engine. `CAST` both sides to REAL and the ratio is well-defined.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What changes if there's a real `training_duration_seconds` column? _(Probes whether you recognize epoch-as-duration is a hack and would swap in the proper denominator.)_
- Return the top model per `framework` instead of overall top. _(Forces you to reach for `ROW_NUMBER() OVER (PARTITION BY framework ORDER BY ratio DESC)`.)_
- How do you break ties when two models share the top ratio? _(Tests secondary `ORDER BY` keys and the difference between `LIMIT 1` and `RANK() = 1`.)_
- Why might `strftime('%s', train_at)` return NULL? _(Checks that you know SQLite returns NULL for malformed timestamp strings, which silently drops rows from the ranking.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/best_accuracy_to_training_time_ratio)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.