# Top Framework by Deployments

> The framework most often deployed.

Canonical URL: <https://datadriven.io/problems/top_framework_by_deployments>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

Find the framework of the model author who has the most deployed entries. If there is a tie, pick the first alphabetically by model name.

## Worked solution and explanation

### Why this problem exists in real interviews

Read the prompt and the schema side by side. The prompt mentions a "model author", the schema has no author column. The grouping is `mdl_name` plus `framework`. A peer would push back on that wording before writing a line of SQL, because answering the literal prompt is impossible with these columns.

---

### Break down the requirements

#### Step 1: Filter to deployed rows only

Status values like `staging`, `archived`, or `failed` should not inflate counts. The `WHERE status = 'deployed'` predicate runs before any aggregation.

#### Step 2: Aggregate at the (mdl_name, framework) grain

A single `mdl_name` can appear under more than one framework across versions, so both columns belong in the GROUP BY. The aggregate is `COUNT(*)` of deployed rows.

#### Step 3: Order, break the tie, take one

Sort by `deploy_count DESC`, then `mdl_name ASC` to honor the alphabetical tiebreaker. `LIMIT 1` returns a single row, from which the `framework` is projected.

---

### The solution

**TOP FRAMEWORK BY DEPLOYMENTS**

```sql
WITH deployed_counts AS (
  SELECT mdl_name, framework, COUNT(*) AS deploy_count
  FROM ml_models
  WHERE status = 'deployed'
  GROUP BY mdl_name, framework
),
top_model AS (
  SELECT mdl_name, framework
  FROM deployed_counts
  ORDER BY deploy_count DESC, mdl_name ASC
  LIMIT 1
)
SELECT framework
FROM top_model
```

> **Cost Analysis**
>
> At 3,000 rows the planner does one hash aggregate then a top-1 sort, cheap on any engine. If `status` is low-cardinality, a partial index on `(status) WHERE status = 'deployed'` saves the filter scan once the table grows past memory.

> **Interviewers Watch For**
>
> Whether you notice the prompt says "author" but the schema has no author column. Flagging that mismatch out loud, then proceeding with `mdl_name` as the stated grain, is the right move. Silently swapping nouns is not.

> **Common Pitfall**
>
> Grouping by `framework` alone collapses different models that share a framework, inflating the count and answering a different question. Same model name across two frameworks is two rows in the CTE, not one.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What changes if there is a real `author_id` column and the question is genuinely per-author? _(Probes whether grain shifts cleanly when the GROUP BY column changes.)_
- How would you return the top framework per status tier (`deployed`, `staging`, `archived`)? _(Tests window functions versus repeated CTEs for per-group leaders.)_
- If two `(mdl_name, framework)` pairs tie and the prompt wants both, how does the query change? _(Forces the move from `LIMIT 1` to `RANK()` with a `WHERE rnk = 1` filter.)_

> **Trick**
>
> `LIMIT 1` only works because the prompt names an explicit tiebreaker. Without one, `ORDER BY deploy_count DESC LIMIT 1` is nondeterministic across ties and your answer flickers between runs on the same data.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_framework_by_deployments)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.