# Active Duo

> Shoppers who also browse. The overlap is the insight.

Canonical URL: <https://datadriven.io/problems/active_duo>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The growth team is building a cross-engagement segment of users who both make purchases and log browsing sessions on the platform. Return a deduplicated list of usernames for users with activity in both areas.

## Worked solution and explanation

### Why this problem exists in real interviews

Working with `users`, `transactions`, `user_sessions`, this problem isolates filtering and projection. The interviewer expects candidates to articulate why `username`, `email`, `signup_date` matter for correctness before touching the keyboard.

---

### Break down the requirements

#### Step 1: Identify the linking column

`user_id` links `users` to `transactions`, and `user_id` links to `user_sessions`.

#### Step 2: Inner join all three tables

INNER JOINs keep only users present in both `transactions` and `user_sessions`.

#### Step 3: Deduplicate with DISTINCT

Users may have multiple rows in each table; `DISTINCT` collapses them to one row per user.

---

### The solution

**Distinct-filter for active duo**

```sql
SELECT DISTINCT a.username
FROM users a
JOIN transactions b ON a.user_id = b.user_id
JOIN user_sessions c ON a.user_id = c.user_id
ORDER BY a.username
```

> **Cost Analysis**
>
> The main table has 30M rows (8 GB). An index on the filter or join column would improve performance at scale.

> **Interviewers Watch For**
>
> Join type selection (INNER vs LEFT) reveals whether the candidate read the requirements carefully.

> **Common Pitfall**
>
> Returning extra columns not asked for, or missing a required column, are both common mistakes that fail automated grading.

---

## Common follow-up questions

- What happens to your results if `user_id` in `users` contains trailing whitespace or mixed casing? _(Tests awareness of text normalization issues that silently fragment GROUP BY results.)_
- If `users` were partitioned by date, would your query need to scan all partitions or could it prune? How would you verify? _(Tests understanding of partition pruning and EXPLAIN output.)_
- `user_id` in `users` has ~30M distinct values. What index strategy keeps your query from doing a full table scan? _(Tests whether the candidate can design indexes for high-cardinality columns and understands selectivity.)_
- Could you express this same logic as a single query without CTEs or subqueries? What readability trade-off does that introduce? _(Tests whether the candidate can flatten nested logic and understands when decomposition aids maintainability.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/active_duo)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.