# Most Common Export Job Status

> The most common job status.

Canonical URL: <https://datadriven.io/problems/most_common_export_job_status>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Export jobs have been flaky lately, and the data platform team wants to know the baseline: across all batch jobs whose name contains 'export', what is the single most common completion status and how many times did it occur?

## Worked solution and explanation

### Why this problem exists in real interviews

Working against batch_jobs, this problem tests grouping and top-N selection on the job_name and status columns. Interviewers use it as a fundamentals check because a subtle mis-grouping or filter placement changes the output without raising an error.

---

### Break down the requirements

#### Step 1: Filter to the target rows

Apply the `LIKE` pattern match in the `WHERE` clause. This narrows the dataset before any grouping or aggregation.

#### Step 2: Aggregate with COUNT

Group by the output grain and apply `COUNT()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 3: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

---

### The solution

**LIKE filter with mode via ORDER BY DESC LIMIT 1**

```sql
SELECT status, COUNT(*) AS occurrences
FROM batch_jobs
WHERE job_name LIKE '%export%'
GROUP BY status
ORDER BY occurrences DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 300K rows from `batch_jobs`.

> **Interviewers Watch For**
>
> Candidates who verbalize their approach before typing, naming the output columns and expected row count, consistently perform better.

> **Common Pitfall**
>
> Comparing dates stored as TEXT without casting can produce lexicographic instead of chronological ordering. Always confirm the column type.

---

## Common follow-up questions

- What happens to your result if batch_jobs.ended contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on ended.)_
- How would you verify that your aggregation on batch_jobs.job_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in batch_jobs.job_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like job_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/most_common_export_job_status)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.