# Search Term Length vs Click Rates

> Longer queries, more clicks?

Canonical URL: <https://datadriven.io/problems/search_term_length_vs_click_rates>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The search relevance team wants to know whether longer queries lead to more clicks. For each unique search term length, show the total number of queries and how many had a clicked result, listed by length.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this search behavior scenario to test conditional aggregation via CASE against the `search_queries` table. The focus is on how you handle the `search_term` column when building the result.

> **Trick to Solving**
>
> When the prompt asks for multiple metrics split by a condition (e.g., resolved vs. unresolved), conditional aggregation avoids multiple passes.
> 
> 1. Spot the split: two or more categories in one output row
> 2. Use `SUM(CASE WHEN condition THEN 1 ELSE 0 END)` for each bucket
> 3. Group by the common dimension

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `search_term` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Use conditional aggregation with CASE

A `CASE` expression inside the aggregate function splits rows into buckets without multiple passes over the data. Each condition maps to one output column.

#### Step 3: Aggregate by `term_length`

`GROUP BY term_length` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 4: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Case pivot for search term length vs click**

```sql
SELECT LENGTH(search_term) AS term_length, COUNT(*) AS query_count, SUM(CASE WHEN clicked_result = 1 THEN 1 ELSE 0 END) AS clicked_count
FROM search_queries
WHERE search_term IS NOT NULL
GROUP BY term_length
ORDER BY term_length
```

> **Cost Analysis**
>
> With ~80M rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for how you handle NULL values and whether you account for them in filters and aggregations; whether you can pivot data with conditional aggregation in a single pass instead of multiple queries.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- What would happen to your result if `search_queries.results_count` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `results_count` and uses DISTINCT or deduplication where needed.)_
- `search_queries.query_time` has roughly 31,536,000 distinct values. What index strategy would you use to avoid a full scan on `search_queries`? _(Tests indexing knowledge specific to the high-cardinality `query_time` column in `search_queries`.)_
- Your conditional CASE logic assumes the categories are exhaustive. What happens if a row in `search_queries` falls into none of the branches? _(Tests awareness of the implicit ELSE NULL in CASE expressions.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/search_term_length_vs_click_rates)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.