# Long Searches Containing 'er'

> Long queries with 'er'. A pattern?

Canonical URL: <https://datadriven.io/problems/long_searches_containing_er>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The search analytics team is investigating long-tail query patterns. Pull all search queries where the term is longer than 12 characters and ends with the letter 'r' (case-insensitive). Return all fields, ordered by query ID.

## Worked solution and explanation

### Why this problem exists in real interviews

This focuses on pattern matching within search_queries, specifically around the search_term column. Interviewers present it as a fundamentals check because the edge cases around NULL values and boundary conditions reveal depth of understanding.

---

### Break down the requirements

#### Step 1: Filter to the target rows

Apply the `LIKE` pattern match in the `WHERE` clause. This narrows the dataset before any grouping or aggregation.

#### Step 2: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Compound string filter with length check**

```sql
SELECT query_id, user_id, search_term, results_count, clicked_result, query_time
FROM search_queries
WHERE LENGTH(search_term) > 12
    AND LOWER(search_term) LIKE '%r'
ORDER BY query_id
```

> **Cost Analysis**
>
> The query scans 40M rows from `search_queries`. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Placing a filter in `WHERE` instead of `HAVING` (or vice versa) is a common mistake. `WHERE` filters rows before aggregation; `HAVING` filters groups after.

---

## Common follow-up questions

- If search_queries.query_id could contain unexpected NULL values, how would your query behave? _(Tests NULL awareness even when the schema does not currently allow NULLs in query_id.)_
- How would you verify that your aggregation on search_queries.query_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in search_queries.query_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like query_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/long_searches_containing_er)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.