# Non-Draft Content

> Everything except drafts.

Canonical URL: <https://datadriven.io/problems/non_draft_content>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The publishing pipeline automatically appends '[DRAFT]' to titles that haven't been finalized. Find all content items whose title does not end with that suffix, since those are ready for syndication.

## Worked solution and explanation

### Why this problem exists in real interviews

Querying content_items for title data using pattern matching tests whether you can translate a business requirement into the right column references and filter sequence. It shows up as a fundamentals check to verify practical fluency.

---

### Break down the requirements

#### Step 1: Filter out draft titles

`WHERE title NOT LIKE '%[DRAFT]'` excludes titles ending with the draft suffix.

#### Step 2: Return all content items

`SELECT *` returns all fields for matching rows.

---

### The solution

**NOT LIKE suffix filter**

```sql
SELECT *
FROM content_items
WHERE title NOT LIKE '%[DRAFT]'
```

> **Cost Analysis**
>
> The query scans `content_items` (3,000,000 rows). A covering index on the filter and group columns would reduce I/O. At this scale, the full scan is acceptable but becomes costly if the table grows 10x.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What happens to your result if content_items.duration_seconds contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on duration_seconds.)_
- How would you verify that your aggregation on content_items.content_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in content_items.content_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like content_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/non_draft_content)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.