# The Repeat Review

> The echo came back.

Canonical URL: <https://datadriven.io/problems/the_repeat_review>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a dict mapping store names to lists of review strings, return the single review string that appears in the most distinct stores. Within a store, duplicates count only once toward that store. Tie-break alphabetically by review string.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **set-based deduplication within groups followed by cross-group aggregation**. It checks whether candidates can handle the 'count distinct per group, then aggregate across groups' pattern common in analytics.

---

### Break down the requirements

#### Step 1: Deduplicate reviews within each store

For each store, convert its reviews to a set so that repeated submissions count as one.

#### Step 2: Count store appearances per review

For each unique review across all stores, count how many stores submitted it.

#### Step 3: Find the review with the most stores

Return the review that appears in the highest number of distinct stores.

---

### The solution

**Per-store dedup then cross-store counting**

```python
def most_frequent_review(store_reviews: dict[str, list[str]]) -> str:
    review_store_count = {}
    for reviews in store_reviews.values():
        for review in set(reviews):
            review_store_count[review] = review_store_count.get(review, 0) + 1
    best_review = None
    best_count = 0
    for review, count in review_store_count.items():
        if count > best_count or (count == best_count and review < best_review):
            best_count = count
            best_review = review
    return best_review
```

> **Time and Space Complexity**
>
> **Time:** O(s * r) where s is the number of stores and r is the average number of reviews per store.
> 
> **Space:** O(u) where u is the total number of unique reviews across all stores.

> **Interviewers Watch For**
>
> Do you deduplicate within each store before counting? Without the `set()` conversion, a store that submits 'great service' 100 times would inflate that review's count.

> **Common Pitfall**
>
> Counting total occurrences instead of distinct store occurrences. The problem asks which review appears in the most stores, not which review was submitted the most times.

---

## Common follow-up questions

- What if you needed the top 3 most widespread reviews? _(Tests extracting multiple items from the count dict using a heap or sorted extraction.)_
- How would you handle ties? _(Tests defining a tie-breaking rule: alphabetical, first encountered, or return all.)_
- What if reviews were fuzzy (e.g., 'great' vs 'Great!' vs 'GREAT')? _(Tests text normalization before deduplication.)_
- How does this pattern relate to TF-IDF? _(Tests awareness that 'document frequency' is the same concept applied to text search.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_repeat_review)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.