# Silent Users

> Users who have never typed a query.

Canonical URL: <https://datadriven.io/problems/silent_users>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The search relevance team is running an engagement audit and needs to identify users who have never issued a search query on the platform. Pull every user account ,  even those with no search activity on record ,  and surface only the accounts that have no associated searches. For each such user, show their username and the date they signed up. Present from the most recently joined to the earliest.

## Worked solution and explanation

### Why this problem exists in real interviews

Drawn from a search behavior domain, this question centers on self-join over the `users` and `search_queries` tables. The tricky part is handling columns like `user_id`, `username`, and `signup_date` correctly under the given constraints.

---

### Break down the requirements

#### Step 1: Left join to preserve all base rows

A `LEFT JOIN` from `users` ensures every row appears in the output even if there is no match in `search_queries`. Missing values become NULL.

#### Step 2: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 3: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Left join to preserve all base rows to find silent users**

```sql
SELECT u.username, u.signup_date
FROM users u
LEFT
JOIN search_queries sq ON u.user_id = sq.user_id
WHERE sq.query_id IS NULL
ORDER BY u.signup_date DESC
```

> **Cost Analysis**
>
> With ~2005M rows, the join cost depends on the smaller table's cardinality. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you choose the correct join type to avoid silently dropping rows; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Forgetting to filter NULLs creates phantom groups or inflated counts. Always check `null_fraction` in the schema before assuming columns are clean.

---

## Common follow-up questions

- What result would you get if every value in `search_queries.clicked_result` were NULL? Would your query return an empty set or something unexpected? _(Tests extreme NULL scenarios and whether the candidate guards against edge cases in `clicked_result`.)_
- With 500,000,000 distinct values in `search_queries.search_term`, how would a composite index on the GROUP BY columns change the execution plan? _(Probes understanding of how cardinality in `search_term` affects grouping and sort operations.)_
- `search_queries.clicked_result` only has 2 distinct values. If a new category were added, would your query automatically include it? _(Tests whether the query hard-codes values or dynamically adapts to `clicked_result` changes.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/silent_users)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.