# Q2 Search Volume

> Q2 search volume. The numbers.

Canonical URL: <https://datadriven.io/problems/q2_search_volume>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The search team is benchmarking Q2 2026 query volume against the prior quarter. How many total search queries were received between April 1 and June 30 2026 inclusive?

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this search behavior scenario to test filtering and projection against the `search_queries` table. The focus is on how you handle the `query_time` column when building the result.

---

### Break down the requirements

#### Step 1: Filter to Q2 date range

`WHERE query_time >= '2026-04-01' AND query_time < '2026-07-01'` (adjusting for the appropriate year) captures April 1 through June 30 inclusive.

#### Step 2: Count the queries

`SELECT COUNT(*)` returns the total Q2 query volume.

---

### The solution

**Filter to q2 date range to find q2 search volume**

```sql
SELECT COUNT(*) AS q2_volume
FROM search_queries
WHERE query_time >= '2026-04-01'
  AND query_time < '2026-07-01'
```

> **Cost Analysis**
>
> With `search_queries` (40,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `search_queries.search_term` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `search_term` and uses DISTINCT or deduplication where needed.)_
- `search_queries.search_term` has roughly 4,000,000 distinct values. What index strategy would you use to avoid a full scan on `search_queries`? _(Tests indexing knowledge specific to the high-cardinality `search_term` column in `search_queries`.)_
- How would you modify this query if the business logic required grouping by both `query_id` and `user_id` instead of just one? _(Tests ability to adapt the query structure to changing requirements.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/q2_search_volume)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.