# Product Page Sale Searches

> They searched from the product page.

Canonical URL: <https://datadriven.io/problems/product_page_sale_searches>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The marketing team wants to gauge how much organic demand exists for sales and discounts. How many search queries contain the word 'sale'?

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a marketing analytics domain, this question centers on pattern matching with LIKE over the `search_queries` table. The tricky part is handling the `search_term` column correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Filter to searches containing 'sale'

`WHERE search_term LIKE '%sale%'` matches any search term containing the word.

#### Step 2: Count the matches

`SELECT COUNT(*)` returns the total number of matching queries.

---

### The solution

**Pattern-match for product page sale searches**

```sql
SELECT COUNT(*) AS sale_query_count
FROM search_queries
WHERE search_term LIKE '%sale%'
```

> **Cost Analysis**
>
> With `search_queries` (50,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `search_queries.results_count` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `results_count` and uses DISTINCT or deduplication where needed.)_
- `search_queries.query_time` has roughly 31,536,000 distinct values. What index strategy would you use to avoid a full scan on `search_queries`? _(Tests indexing knowledge specific to the high-cardinality `query_time` column in `search_queries`.)_
- Your LIKE pattern may prevent index usage on `search_queries`. How would you restructure the filter to be index-friendly? _(Tests understanding of leading-wildcard LIKE and its impact on index scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/product_page_sale_searches)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.