# Unique Stream Topics

> A clean inventory of streaming topics.

Canonical URL: <https://datadriven.io/problems/unique_stream_topics>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The streaming platform team is building a topic registry. List every unique topic currently in the message catalog.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests `SELECT DISTINCT` on a single column from a very large table. Interviewers check whether you can extract unique values efficiently from a high-volume streaming dataset.

---

### Break down the requirements

#### Step 1: Select distinct topics

`SELECT DISTINCT topic` extracts all unique topic values from the 500M row `stream_msgs` table, producing at most 50 rows.

---

### The solution

**Distinct values from a high-volume table**

```sql
SELECT DISTINCT topic
FROM stream_msgs
```

> **Cost Analysis**
>
> Full scan of 500M rows. Despite the table size, only 50 distinct topics exist, so the hash set for deduplication is tiny. An index on `topic` could enable a loose index scan that skips directly between distinct values.

> **Interviewers Watch For**
>
> At scale, strong candidates mention that a loose index scan (or index skip scan) can find 50 distinct values without scanning all 500M rows.

> **Common Pitfall**
>
> Using `GROUP BY topic` without any aggregate. While it produces the same result as DISTINCT, it is less readable and implies an aggregation intent that does not exist here.

---

## Common follow-up questions

- How would you count messages per topic? _(Add COUNT(*) with GROUP BY topic.)_
- What if topics could be NULL? _(DISTINCT includes one NULL row; decide if that is desired or filter with WHERE topic IS NOT NULL.)_
- How would you efficiently find distinct topics in a partitioned table? _(Tests knowledge of partition pruning and loose index scans across partitions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/unique_stream_topics)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.