# Average Search Results Per User

> How many results per searcher?

Canonical URL: <https://datadriven.io/problems/average_search_results_per_user>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The personalization team is investigating whether some users consistently see richer search results than others. For each user who has issued at least one search, show their average number of results returned.

## Worked solution and explanation

### Why this problem exists in real interviews

The core skill being tested is post-aggregation filtering with HAVING over `search_queries`. Candidates must decide how `search_term`, `results_count`, `clicked_result` interact before choosing a join strategy or aggregation level.

---

### Break down the requirements

#### Step 1: Group by `query_id`

`GROUP BY` at the correct grain produces one row per group.

#### Step 2: Compute `AVG(results_count)`

The AVG function computes the avg per group.

#### Step 3: Apply the HAVING threshold

Post-aggregation filter keeps only groups meeting the minimum.

#### Step 4: Order by the metric

Sort by `avg_results_count` desc for readability.

---

### The solution

**Having-filter for average search results user**

```sql
SELECT
    query_id,
    AVG(results_count) AS avg_results_count
FROM search_queries
GROUP BY query_id
HAVING AVG(results_count) >= 1
ORDER BY avg_results_count DESC
```

> **Cost Analysis**
>
> The main table has 20M rows (5 GB). Partitioned on `query_time`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens to your results if `search_term` in `search_queries` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Could you move the HAVING condition into a WHERE clause on a subquery instead? What are the trade-offs? _(Tests understanding of filter push-down: WHERE filters before grouping, HAVING after.)_
- `query_id` in `search_queries` has ~20M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- If the business definition of `search_term` changed mid-quarter (e.g., a status value was renamed), how would you handle historical consistency? _(Tests awareness of slowly changing dimensions and backward-compatible query design.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_search_results_per_user)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.