# Top 3 First-View Pages

> The first three pages new users see.

Canonical URL: <https://datadriven.io/problems/top_3_first_view_pages>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Look at each user's first 3 page views after signup. Count how often each page appears across all users and surface the top 3. If pages are tied at the cutoff, include all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

Onboarding and growth teams care about which pages dominate the first session because those pages drive activation, not lifetime traffic. Interviewers use this prompt to see whether you can scope to a per-user prefix (the first 3 views), aggregate across users, and handle ties at the cutoff with a no-gaps rank instead of a hard LIMIT.

---

### Break down the requirements

#### Step 1: Number views per user by viewed_at

Use ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY viewed_at ASC) so each user gets a 1, 2, 3, ... sequence independent of every other user. The PARTITION BY is what makes this a per-user first-3, not a global first-3.

#### Step 2: Filter to view_order <= 3 before aggregating

Apply the cutoff in a CTE that pulls only the prefix rows, then GROUP BY page_url and COUNT(*) to get appearance_count across all users. Filtering before the aggregate avoids counting late-funnel pages that should not be in the leaderboard.

#### Step 3: Use DENSE_RANK to keep ties at the cutoff

DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) lets you keep every page that ties at rank 3. A LIMIT 3 would silently drop one of the tied pages and the prompt explicitly says to include all of them.

---

### The solution

**First-3 prefix, then dense rank with tie inclusion**

```sql
WITH ranked_views AS (
  SELECT user_id, page_url, viewed_at,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY viewed_at ASC) AS view_order
  FROM page_views
),
page_counts AS (
  SELECT page_url, COUNT(*) AS appearance_count,
         DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) AS rnk
  FROM ranked_views
  WHERE view_order <= 3
  GROUP BY page_url
)
SELECT page_url, appearance_count
FROM page_counts
WHERE rnk <= 3
ORDER BY appearance_count DESC
```

> **Cost Analysis**
>
> page_views has 600M rows and 10M users with zipf skew. The PARTITION BY user_id window is the dominant cost; an index on (user_id, viewed_at) lets the planner stream rows already sorted and avoids a sort-per-partition. The second CTE collapses to about 40k page_url groups, so the rank step is cheap.

> **Interviewers Watch For**
>
> Did you partition by user_id (not globally rank), use DENSE_RANK (not ROW_NUMBER) for ties, and filter to view_order <= 3 before aggregating? Candidates who LIMIT 3 at the end fail the explicit tie rule in the prompt.

> **Common Pitfall**
>
> Ordering by viewed_at without PARTITION BY user_id ranks views globally, so the top of the list is just the earliest registered users. The per-user PARTITION BY is non-negotiable for a first-view metric.

---

## Common follow-up questions

- How would you handle users who only ever viewed one page? _(They contribute exactly one row to the first-3 prefix; the COUNT still works. No special case needed, but interviewers may probe whether you noticed.)_
- What changes if 'first 3' should be 'first 3 distinct pages'? _(Switch ROW_NUMBER to operate on a deduped per-user CTE (DISTINCT user_id, page_url, MIN(viewed_at)) before ranking, so a user who reloads the same page doesn't burn their prefix slots.)_
- Would you recompute this nightly or maintain it incrementally? _(First-N-per-user is order-dependent on early rows, so an incremental aggregate has to track each user's current prefix. Most teams just recompute against a partitioned table.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/top_3_first_view_pages)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.