# The Publishing Audit

> Published years ago. Still generating views?

Canonical URL: <https://datadriven.io/problems/the_publishing_audit>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The content analytics team is auditing the 2026 publishing pipeline. Pull all fields for content items published during that year.

## Worked solution and explanation

### What this is really asking

`publish_date` on 4M rows is the only filter. Pull one calendar year as a contiguous range against the column the planner can actually index.

---

### Break down the requirements

#### Step 1: All fields

`SELECT *` is fine. The audit wants the raw row shape, not a curated projection.

#### Step 2: Bound the year

Jan 1 through Dec 31. Keep the predicate on the bare column so the index handles it.

---

### The solution

**PUBLISHING AUDIT FOR ONE YEAR**

```sql
SELECT *
FROM content_items
WHERE publish_date BETWEEN '2026-01-01' AND '2026-12-31'
```

> **Cost Analysis**
>
> A range predicate on `publish_date` is index-friendly: one seek to Jan 1, scan forward to Dec 31. Wrapping the column in a function forces a full scan of all 4M rows.

> **Interviewers Watch For**
>
> Whether you keep the predicate on the bare column, and whether you ask if `publish_date` is DATE or TIMESTAMP, since that flips the upper bound shape.

> **Common Pitfall**
>
> If `publish_date` is a TIMESTAMP, `BETWEEN ... AND '2024-12-31'` truncates to `2024-12-31 00:00:00` and drops every row published later that day. Switch to `>= '2024-01-01' AND < '2025-01-01'`.

> **The False Start**
>
> First instinct is `WHERE YEAR(publish_date) = 2024`. Reads cleanly, but the function call kills the index and forces a full scan of 4M rows. Pivot to a bare-column range so the planner can seek.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would the query change if `publish_date` were a TIMESTAMP? _(Use a half-open `>= '2024-01-01' AND < '2025-01-01'` so Dec 31 afternoon rows survive.)_
- What index makes this fast? _(A B-tree on `publish_date`. If audits always pull one year, partitioning by year is cleaner.)_
- How would you handle multi-year audits? _(Parameterize the year and `UNION ALL`, or widen the range and add `EXTRACT(YEAR FROM publish_date)` to the projection.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_publishing_audit)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.