# Product Name Prefix

> Just the first three characters. That is all.

Canonical URL: <https://datadriven.io/problems/product_name_prefix>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Show each product's ID alongside just the first three characters of its name, by product ID.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply string manipulation to the `products` table, simulating a real product analytics workflow. Pay attention to the `product_name` and `product_id` columns as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Extract first three characters

`SUBSTR(product_name, 1, 3)` returns the first three characters of each product name.

#### Step 2: Order by product_id

`ORDER BY product_id` maintains the requested sequence.

---

### The solution

**Extract first three characters to find product name prefix**

```sql
SELECT product_id, SUBSTR(product_name, 1, 3) AS name_prefix
FROM products
ORDER BY product_id
```

> **Cost Analysis**
>
> With `products` (8,000 rows), this query scans a small dataset. No indexing is needed for this volume. At production scale, an index on the primary filter column would improve performance.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- If `rating` in `products` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `products.rating`.)_
- The `price` column in `products` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `products.price` and its impact on distributed query performance.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/product_name_prefix)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.