# Average Results for Python Searches

> Python searches. What's the click-through?

Canonical URL: <https://datadriven.io/problems/average_results_for_python_searches>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Users searching for keyboards have been complaining about empty result pages. Compute the average number of results returned for search queries whose term contains 'keyboard', regardless of casing.

## Worked solution and explanation

### Why this problem exists in real interviews

By forcing grouped aggregation on `search_queries`, this question separates candidates who understand how `search_term`, `results_count`, `clicked_result` behave under aggregation from those who guess at the GROUP BY clause.

---

### Break down the requirements

#### Step 1: Group by `query_id`

`GROUP BY` at the correct grain produces one row per group.

#### Step 2: Compute `AVG(results_count)`

The AVG function computes the avg per group.

#### Step 3: Order by the metric

Sort by `avg_results_count` desc for readability.

---

### The solution

**Group-aggregate for average results python searches**

```sql
SELECT
    query_id,
    AVG(results_count) AS avg_results_count
FROM search_queries
GROUP BY query_id
ORDER BY avg_results_count DESC
```

> **Cost Analysis**
>
> The main table has 100M rows (26 GB). Partitioned on `query_time`, so queries filtering on that column skip most partitions. The GROUP BY reduces the row count early, keeping downstream operations cheap.

> **Interviewers Watch For**
>
> Strong candidates state the correct `GROUP BY` grain before writing any SQL, showing they think about the output shape first.

> **Common Pitfall**
>
> Selecting a non-aggregated column without including it in `GROUP BY` is the most common error. Some engines reject it; others silently return arbitrary values.

---

## Common follow-up questions

- What happens to your results if `search_term` in `search_queries` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- Your GROUP BY aggregates `query_id` from `search_queries`. If two groups have the same aggregate value, how is the output ordered, and is that deterministic? _(Tests awareness that ORDER BY on a non-unique value produces non-deterministic row order without a tiebreaker.)_
- `query_id` in `search_queries` has ~100M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_results_for_python_searches)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.