# Last Five Batch Jobs

> The last five. A quick tail check.

Canonical URL: <https://datadriven.io/problems/last_five_batch_jobs>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Quick tail check on recent pipeline runs. Pull the five most recent batch job records by job ID, from newest to oldest. For each job, show the job ID, name, status, rows completed, start time, end time, priority, and retries.

## Worked solution and explanation

### Why this problem exists in real interviews

The batch_jobs table contains job_name and status values that must be processed with top-N selection. This appears as a fundamentals check to probe whether you reason about the correct aggregation grain before writing any window or GROUP BY clause.

> **Trick to Solving**
>
> Keeping the most recent row per group is a classic `ROW_NUMBER` pattern.
> 
> 1. `ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY ts DESC)` assigns 1 to the latest row
> 2. Wrap in a subquery or CTE
> 3. Filter to `rn = 1`

---

### Break down the requirements

#### Step 1: Read from `batch_jobs`

The query targets `batch_jobs` with 8 columns. Identify which columns are needed for the output.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**ORDER BY timestamp DESC with LIMIT**

```sql
SELECT job_id, job_name, status, rows_done, started, ended, priority, retries
FROM batch_jobs
ORDER BY started DESC
LIMIT 5
```

> **Cost Analysis**
>
> The query scans 250K rows from `batch_jobs`.

> **Interviewers Watch For**
>
> Explaining why `ROW_NUMBER` is preferred over `DISTINCT` for deduplication shows you understand the difference between collapsing and selecting.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- What happens to your result if batch_jobs.ended contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on ended.)_
- How would you verify that your aggregation on batch_jobs.job_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in batch_jobs.job_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like job_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/last_five_batch_jobs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.