# Pending Batch Jobs

> Stuck jobs. Still pending.

Canonical URL: <https://datadriven.io/problems/pending_batch_jobs>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

How many batch jobs that started in December 2025 are still pending, meaning they have no ended timestamp and their status indicates no completion? Return a single count.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets pattern matching with LIKE across the `batch_jobs` table. You need to work with columns like `started`, `status`, and `ended` to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Filter to December of the target year

`WHERE started LIKE '2025-12%'` (or equivalent date extraction) restricts to jobs started in December.

#### Step 2: Filter to pending status

`AND status = 'pending'` keeps only incomplete jobs.

#### Step 3: Filter to no end timestamp

`AND ended IS NULL` ensures the job has not finished.

#### Step 4: Count the result

`SELECT COUNT(*)` produces the single pending count.

---

### The solution

**Pattern-match for pending batch jobs**

```sql
SELECT COUNT(*) AS pending_count
FROM batch_jobs
WHERE started LIKE '2025-12%'
  AND status = 'pending'
  AND ended IS NULL
```

> **Cost Analysis**
>
> The query scans `batch_jobs` (400,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `ended` in `batch_jobs` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `batch_jobs.ended`.)_
- If `batch_jobs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `started`? _(Tests ability to identify performance hotspots related to `batch_jobs.started` at scale.)_
- Your LIKE pattern may prevent index usage on `batch_jobs`. How would you restructure the filter to be index-friendly? _(Tests understanding of leading-wildcard LIKE and its impact on index scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/pending_batch_jobs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.