# Yearly Output

> Publishing velocity for the board deck.

Canonical URL: <https://datadriven.io/problems/yearly_output>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The editorial director is building the annual content report and wants to see publishing velocity broken down by year. For each calendar year, show the number of items published and the average duration in seconds. Exclude any items that have no publish date on file. Present the years in chronological order.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests date extraction, NULL filtering, and multi-aggregate GROUP BY. Interviewers check whether you can derive a time dimension from a date column, exclude incomplete records, and compute multiple aggregates.

---

### Break down the requirements

#### Step 1: Exclude items without a publish date

`WHERE publish_date IS NOT NULL` removes the 5% of items with no publish date.

#### Step 2: Extract year and aggregate

`GROUP BY STRFTIME('%Y', publish_date)` with `COUNT(*)` and `AVG(duration_seconds)` computes items published and average duration per year.

#### Step 3: Order chronologically

`ORDER BY year` presents the years from earliest to most recent.

---

### The solution

**Yearly aggregation with NULL exclusion**

```sql
SELECT
    STRFTIME('%Y', publish_date) AS year,
    COUNT(*) AS items_published,
    AVG(duration_seconds) AS avg_duration
FROM content_items
WHERE publish_date IS NOT NULL
GROUP BY STRFTIME('%Y', publish_date)
ORDER BY year
```

> **Cost Analysis**
>
> Full scan of 500K rows minus 5% NULLs = 475K rows. The GROUP BY produces a handful of yearly buckets. Trivially fast.

> **Interviewers Watch For**
>
> Whether you handle the NULL publish_date filter. The prompt says "exclude any items that have no publish date on file," making the WHERE clause mandatory.

> **Common Pitfall**
>
> Forgetting that `duration_seconds` has 10% NULL fraction. AVG ignores NULLs, so the average is computed only over items with recorded durations. Clarify if the business wants to treat NULL durations as 0.

---

## Common follow-up questions

- How would you show the year-over-year growth in items published? _(Use LAG to access the previous year's count and compute the difference.)_
- What if you needed to break this down by content_type as well? _(Add content_type to the GROUP BY for a two-dimensional breakdown.)_
- How would you handle items published on the same day with different durations? _(The aggregation handles this naturally; each item is counted individually.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/yearly_output)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.