# Market Share

> Every category wants a bigger slice.

Canonical URL: <https://datadriven.io/problems/market_share>

Domain: SQL · Difficulty: hard · Seniority: L5

## Problem

Strategy needs each product category's share of total revenue for the investor deck. For each category, compute its percentage of total transaction revenue against the platform-wide total. Return the category and its revenue share.

## Worked solution and explanation

### Why this problem exists in real interviews

Joining products to transactions and then applying grouping and summation reveals whether you can manage grain changes across tables. Interviewers pick this in senior-level rounds because a wrong join order or missed GROUP BY silently inflates the result.

> **Trick to Solving**
>
> Share/ratio calculations divide a group total by a grand total. Use a window function for the denominator.
> 
> 1. Aggregate to the group level
> 2. Use `SUM(metric) OVER ()` to get the grand total in each row
> 3. Divide group total by grand total and multiply by 100

---

### Break down the requirements

#### Step 1: Join tables with INNER JOIN

Connect `products` and `transactions` on `product_id` to keep only matching rows.

#### Step 2: Apply the window function

The window function computes a value across a set of rows related to the current row. The `PARTITION BY` and `ORDER BY` clauses define the scope and sequence.

#### Step 3: Aggregate with SUM

Group by the output grain and apply `SUM()` to compute the metric. The `GROUP BY` must match exactly what the output needs: one row per group key.

#### Step 4: Order the final output

Apply `ORDER BY` as specified to produce the expected row sequence. When tied values exist, add a secondary sort column for determinism.

---

### The solution

**Window SUM for grand-total denominator in share calculation**

```sql
SELECT p.category,
    ROUND(SUM(CAST(t.total_amount AS DOUBLE)) * 100.0 /
        SUM(SUM(CAST(t.total_amount AS DOUBLE))) OVER (), 2) AS revenue_share_pct
FROM transactions t
JOIN products p ON t.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue_share_pct DESC
```

> **Cost Analysis**
>
> The join touches `products` (40K rows) and `transactions` (200M rows). `transactions` is partitioned by `transaction_date`, which the optimizer can exploit with a partition filter.

> **Interviewers Watch For**
>
> Interviewers expect you to articulate why you chose a specific join type and what happens to unmatched rows. Naming the output grain ("one row per X") before writing the GROUP BY shows you think about data shape, not just syntax.

> **Common Pitfall**
>
> Forgetting that a JOIN can multiply rows when the relationship is one-to-many. Always check whether the join key is unique on at least one side.

---

## Common follow-up questions

- What happens to your result if products.rating contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on rating.)_
- If the join between products and transactions produces a fan-out, how does that affect your aggregate? _(Tests awareness of join cardinality and its impact on SUM, COUNT, and AVG results.)_
- With millions of distinct values in transactions.transaction_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like transaction_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/market_share)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.