# Last Migration Record

> The most recent migration. Is it the last?

Canonical URL: <https://datadriven.io/problems/last_migration_record>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

We need to verify the latest schema migration applied to the database. Find the most recently applied migration, identified by the largest migration ID. Show all available fields: migration ID, version, status, applied date, rollback info, duration, author, and database name.

## Worked solution and explanation

### The mental model

**Finding the row where a column hits its MAX is a different problem than finding the MAX value itself.** `SELECT MAX(migr_id) FROM migrations` hands you a single scalar: the largest id. What the prompt actually wants is the entire migration record (version, status, applied, rollback, dur_ms, author, db_name) belonging to that id. Three idiomatic shapes solve this: a scalar subquery in `WHERE` (the canonical answer here), `ORDER BY migr_id DESC LIMIT 1`, or `ROW_NUMBER() OVER (ORDER BY migr_id DESC) = 1`. A correlated self-join is a fourth, but it is strictly worse. Pick the subquery form when ties matter and you want every row sharing the max.

---

### The traps

#### Step 1: You cannot aggregate over all columns

`SELECT MAX(*) FROM migrations` is not valid SQL. `MAX` takes one expression. Beginners sometimes try `SELECT MAX(migr_id), version, status, ... FROM migrations` thinking the non-aggregated columns will track the max row. In MySQL with `ONLY_FULL_GROUP_BY` off you get **arbitrary values** from unrelated rows; in Postgres and modern SQLite (strict mode) you get an error. Always re-fetch the row by filtering on the max.

#### Step 2: Ties: subquery vs LIMIT

`WHERE migr_id = (SELECT MAX(migr_id) FROM migrations)` returns every row tied at the max. `ORDER BY migr_id DESC LIMIT 1` returns exactly one row, with the tiebreaker decided by storage order. For a primary key like `migr_id` ties are impossible, so both are correct. The instant you swap the key for `applied` (a timestamp), the choice becomes load-bearing.

#### Step 3: Largest id is not always latest applied

The prompt equates **largest `migr_id`** with **most recently applied**. That holds when ids are assigned monotonically at apply time, which is true for serial / autoincrement columns. It breaks if migrations are inserted out of order (backfill, cherry-picked hotfixes applied before older queued ones, or distributed id generators like UUIDv7). Read the prompt literally: it asks for the largest id, so trust the id. If the interviewer asks for **latest applied**, order by `applied` instead.

---

### The solution

**Filter by the scalar max**

```sql
SELECT *
FROM migrations
WHERE migr_id = (SELECT MAX(migr_id) FROM migrations);
```

> **Cost shape**
>
> At 4,000 rows everything is instant. At scale the inner `SELECT MAX(migr_id)` is one B-tree rightmost descent if `migr_id` is the primary key (O(log n), microseconds), or one full scan without an index. The outer `WHERE migr_id = ?` is a single index seek. Total: two index touches. `ORDER BY migr_id DESC LIMIT 1` is also one rightmost descent on an indexed column, so the two forms tie on cost. The pitfall version (`SELECT MAX(migr_id), ...` with no GROUP BY) forces a full scan plus the wrong answer.

> **What to say out loud**
>
> Senior signal is asking before coding: **"Are `migr_id` values monotonic with apply time, or could a backfill insert a higher id earlier? Do you want latest by id or latest by `applied`? If two rows tie at the max id, do you want both or just one?"** Then pick the shape that matches. Mentioning that `WHERE col = (SELECT MAX(col) ...)` returns all ties while `LIMIT 1` does not is a cheap way to show you read past the happy path.

> **The seductive wrong answer**
>
> `SELECT MAX(migr_id), version, status, applied, rollback, dur_ms, author, db_name FROM migrations` looks right and runs in some engines. MySQL (legacy mode) returns the max id alongside columns from an unrelated row picked by the engine, not the row that owns the max. You get a real id paired with stale `status` and `author` values. Invisible until production, when the dashboard claims migration 4000 was authored by whoever wrote some other migration entirely.

> **When ORDER BY + LIMIT wins**
>
> If ties cannot exist (primary key) and you want the cleaner one-pass shape, `SELECT * FROM migrations ORDER BY migr_id DESC LIMIT 1` is more readable and avoids the double reference to the table. The subquery form generalizes better when you later need top-N per group via a window function.

---

## Common follow-up questions

- Two migrations end up with the same `migr_id` (imagine the id column is not the primary key, just a version label). How does your answer change? _(Probes whether you know the subquery form returns all ties while `ORDER BY ... LIMIT 1` silently picks one. The right answer depends on whether the consumer wants every tied row or any one.)_
- How would you return the **second** most recently applied migration? _(Forces you off the `MAX(...)` crutch. Clean answers: `ORDER BY migr_id DESC LIMIT 1 OFFSET 1`, or `WHERE migr_id = (SELECT MAX(migr_id) FROM migrations WHERE migr_id < (SELECT MAX(migr_id) FROM migrations))`, or `ROW_NUMBER() = 2`.)_
- The DBA confirms a backfill inserted `migr_id = 5000` last week, but the `applied` timestamp on it is from 2019. Which row does your query return, and is that the right behavior? _(Tests whether you distinguish id ordering from temporal ordering. The query returns id 5000 because the prompt asked for largest id. If the real intent is most recent apply, you need `ORDER BY applied DESC LIMIT 1` instead.)_
- There is no index on `migr_id`. How does that change your choice between the subquery form and `ORDER BY ... LIMIT 1`? _(Both degrade to a full scan, but with comparable cost. The subquery form does one scan to compute `MAX` and a second filter pass; some optimizers fuse them. `ORDER BY DESC LIMIT 1` is one scan tracking the running max. Adding an index on `migr_id` collapses both to O(log n).)_
- How would you adapt this to return the most recent migration **per `db_name`** instead of one global row? _(Shape pivots from global MAX to per-group MAX. Clean answer: `ROW_NUMBER() OVER (PARTITION BY db_name ORDER BY migr_id DESC) = 1`, or a join against `(SELECT db_name, MAX(migr_id) AS m FROM migrations GROUP BY db_name)`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/last_migration_record)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.