# The Relentless Searchers

> Most users look once and leave. A few never stop looking.

Canonical URL: <https://datadriven.io/problems/rank_users_by_search_query_count>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

Rank users by total search queries, most active at rank 1. Users with the same count should fall in alphabetical sequence by user ID (treated as text), and every user gets a unique rank with no ties.

## Worked solution and explanation

### Why this problem exists in real interviews

Interviewers use this search behavior scenario to test ROW_NUMBER for sequential numbering against the `search_queries` table. The focus is on how you handle the `user_id` column when building the result.

---

### Break down the requirements

#### Step 1: Count queries per user

`GROUP BY user_id` with `COUNT(*)` tallies each user's total queries.

#### Step 2: Rank with ROW_NUMBER

`ROW_NUMBER() OVER (ORDER BY query_count DESC, CAST(user_id AS TEXT) ASC)` assigns unique ranks with alphabetical tiebreaker.

---

### The solution

**Row-number for rank users by search query**

```sql
SELECT user_id, query_count, rnk
FROM (
    SELECT
        user_id,
        COUNT(*) AS query_count,
        ROW_NUMBER() OVER (
            ORDER BY COUNT(*) DESC, CAST(user_id AS TEXT) ASC
        ) AS rnk
    FROM search_queries
    GROUP BY user_id
) ranked
```

> **Cost Analysis**
>
> With `search_queries` (60,000,000 rows), the full scan reads significant data. A composite index on the filter columns pushes the predicate into the index layer. Pre-aggregation in a materialized view is worth considering at this scale.

> **Interviewers Watch For**
>
> Interviewers evaluate whether you translate the English requirements into the correct SQL clauses on the first attempt. They watch for clean syntax, correct column references, and whether you verify edge cases before declaring the query complete.

> **Common Pitfall**
>
> The most common mistake is misreading the prompt's filtering or grouping requirements. Double-check which columns to group by, which to aggregate, and whether the output should be filtered with `WHERE` (before grouping) or `HAVING` (after grouping).

---

## Common follow-up questions

- What would happen to your result if `search_queries.query_id` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `query_id` and uses DISTINCT or deduplication where needed.)_
- `search_queries.query_id` has roughly 60,000,000 distinct values. What index strategy would you use to avoid a full scan on `search_queries`? _(Tests indexing knowledge specific to the high-cardinality `query_id` column in `search_queries`.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/rank_users_by_search_query_count)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.