# 30-Day Page View Counts

> Thirty days of engagement. Quick snapshot.

Canonical URL: <https://datadriven.io/problems/30_day_page_view_counts>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The product analytics team needs a 30-day engagement snapshot ending on December 28, 2026 (inclusive). For each user who visited the site during that window, report their user ID and total page view count.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **aggregation and GROUP BY**. Interviewers use straightforward problems like this to verify fluency with core SQL mechanics before moving to harder rounds.

---

### Break down the requirements

#### Step 1: Group by `STRFTIME('%Y'`

`GROUP BY` at the correct grain produces one row per group.

#### Step 2: Compute `SUM(dur_ms)`

The SUM function computes the sum per group.

#### Step 3: Order by the metric

Sort by `sum_dur_ms` desc for readability.

---

### The solution

**Group and aggregate with SUM**

```sql
SELECT
    STRFTIME('%Y', viewed_at) AS year, STRFTIME('%Y', viewed_at), device,
    SUM(dur_ms) AS sum_dur_ms
FROM page_views
GROUP BY STRFTIME('%Y', viewed_at), device
ORDER BY sum_dur_ms DESC
```

> **Cost Analysis**
>
> The main table has 1.2B rows (154 GB). Partitioned on `viewed_at`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens if the table is empty? _(Tests awareness of edge cases: COUNT returns 0, but AVG/MIN/MAX return NULL on empty input.)_
- How would you verify the output is correct with a quick spot check? _(Tests whether the candidate can validate results against known data or row counts.)_
- What index would speed this query up the most? _(Tests basic indexing intuition: filter and join columns are the top candidates.)_
- How would the results change if you used COUNT(DISTINCT col) instead of COUNT(*)? _(Tests understanding of distinct vs total counting and when deduplication matters.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/30_day_page_view_counts)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.