# The Tag Analyst

> Two sets of labels, one analysis.

Canonical URL: <https://datadriven.io/problems/the_tag_analyst>

Domain: Python · Difficulty: medium · Seniority: L4

## Problem

Given two lists of string tags a and b, return a dict with three sorted lists: 'both' (tags in both), 'only_a' (in a but not b), 'only_b' (in b but not a). All three lists are sorted alphabetically.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **set operations**: intersection, difference, and their practical application. It probes whether a candidate can use sets for efficient membership testing and produce clean, sorted results.

---

### Break down the requirements

#### Step 1: Convert both lists to sets

Set operations require set inputs. Converting from lists also eliminates duplicates within each list.

#### Step 2: Compute intersection and differences

Use `&` for shared tags, `-` for tags exclusive to each list.

#### Step 3: Sort each result and return as a dict

The output requires sorted lists for deterministic ordering.

---

### The solution

**Set intersection and difference operations**

```python
def compare_tags(tags_a: list, tags_b: list) -> dict:
    set_a = set(tags_a)
    set_b = set(tags_b)
    both = set_a & set_b
    only_a = set_a - set_b
    only_b = set_b - set_a
    result = {
        "both": sorted(both),
        "only_a": sorted(only_a),
        "only_b": sorted(only_b)
    }
    return result
```

> **Time and Space Complexity**
>
> **Time:** O(n + m) for set construction and O(k log k) for sorting each result, where n and m are list lengths and k is the result size.
> 
> **Space:** O(n + m) for the sets.

> **Interviewers Watch For**
>
> Using set operations directly (`&`, `-`) instead of nested loops. The set approach is both faster and more readable.

> **Common Pitfall**
>
> Iterating through one list and checking membership in the other with `in` on a list (not a set) is O(n*m). Always convert to sets first for O(1) lookups.

---

## Common follow-up questions

- What if you needed to preserve the original order instead of sorting? _(Tests using a seen set while iterating the original list to filter.)_
- How would you handle case-insensitive tag comparison? _(Tests normalizing tags to lowercase before set operations.)_
- What if tags had weights and you needed weighted overlap? _(Tests using dicts with values instead of pure sets.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_tag_analyst)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.