# Mid-Tier Batch Jobs

> Not the biggest, not the smallest. The overlooked middle.

Canonical URL: <https://datadriven.io/problems/mid_tier_batch_jobs>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Find all batch jobs that rank between positions 8 and 10 by rows processed. If jobs tie, they share the same rank without gaps. Show each job's name and rank, by rank ascending.

## Worked solution and explanation

### Why this problem exists in real interviews

The interviewer wants to see you apply dense ranking to batch_jobs.job_name while accounting for the distribution of status. This surfaces as a fundamentals check because small logic errors produce results that look correct at a glance.

---

### Break down the requirements

#### Step 1: Filter to the target rows

Use `BETWEEN` in the `WHERE` clause to select the target range. This is both readable and optimizable by the query planner.

#### Step 2: Rank with DENSE_RANK for tie inclusion

`DENSE_RANK()` assigns the same rank to tied values and never skips numbers. This ensures all tied rows appear in the result.

#### Step 3: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**DENSE_RANK with band filter**

```sql
SELECT job_name, rnk
FROM (
    SELECT job_name, DENSE_RANK() OVER (ORDER BY rows_done DESC) AS rnk
    FROM batch_jobs
) ranked
WHERE rnk BETWEEN 8 AND 10
ORDER BY rnk
```

> **Cost Analysis**
>
> The query scans 350K rows from `batch_jobs`. The window function requires a sort, which is O(n log n). Pre-aggregating reduces the sort input. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Strong candidates explain their choice of window function (`ROW_NUMBER` vs `RANK` vs `DENSE_RANK`) and why it matches the tie semantics. Walking through comparison logic step by step, rather than writing it in one pass, demonstrates structured thinking.

> **Common Pitfall**
>
> Using `ROWS` vs `RANGE` in the window frame produces different results when ties exist. Default to `ROWS` unless you specifically need tie grouping.

---

## Common follow-up questions

- What happens to your result if batch_jobs.ended contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on ended.)_
- If two rows in batch_jobs have identical values in the ORDER BY columns, how does your ranking handle the tie? _(Tests understanding of RANK vs DENSE_RANK vs ROW_NUMBER tie-breaking behavior.)_
- With millions of distinct values in batch_jobs.job_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like job_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/mid_tier_batch_jobs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.