# The Sales Architecture

> Numbers are easy. Making them queryable at scale is the real job.

Canonical URL: <https://datadriven.io/problems/sales_analytics_star_schema>

Domain: Data Modeling · Difficulty: medium · Seniority: L4

## Problem

Our BI team needs a dimensional model to analyze sales performance. We sell physical products through an online marketplace. Analysts need to slice revenue by product, seller, customer, geography, and time. The current OLTP database is normalized and queries are too slow for dashboards. Design the star schema.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate can execute a textbook star schema without cutting corners: one atomic fact, five conformed dimensions, surrogate keys, additive measures. The interviewer is testing discipline, not creativity. The signal is how quickly the candidate locks grain and resists ad-hoc attributes on the fact.

> **Trick to Solving**
>
> The tell is the phrase "sales analytics" without further specialization. Before drawing any tables, a strong candidate asks: who are the stakeholders and what slices do they care about? If the answer covers product, customer, seller, time, and place, the five-dimension star is the canonical shape.
> 
> 1. Lock grain at the sales line
> 2. Pick conformed dimensions for each analysis slice
> 3. Surrogate keys on every dimension
> 4. Additive measures only on the fact

---

### Break down the requirements

#### Step 1: Declare grain

`fact_sales` is one row per product per transaction. Atomic grain is what makes the star answer roll-ups at any level.

#### Step 2: Conformed dimensions per slice

Each stakeholder asks for a different cut: product team wants category, sales team wants region, finance wants quarter. Five dimensions cover the common vocabulary.

#### Step 3: Surrogate keys on dimensions

Each dimension has a BIGINT surrogate. Source system keys change; surrogates do not. This is the main defense against upstream churn.

#### Step 4: Keep measures fully additive

`quantity` and `net_amount` sum cleanly across any dimension combination. Derived metrics (discount %, ASP) are computed at query time.

---

### The solution

Below is one conceptually sound star. Five conformed dimensions and one atomic fact is the canonical retail analytics shape; the discipline is in stopping there.

> **Why This Design Works**
>
> The canonical star exchanges a small amount of query-time join work for maximum evolvability and conformity across the warehouse. Because dimensions are shared, every new fact table (returns, returns, inventory) can reuse them and cross-fact reporting comes for free.

> **Interviewers Watch For**
>
> Strong candidates name the fact type (transaction fact) and explicitly call out additivity of each measure. They also note that `dim_geography` belongs on the fact, not embedded in the customer, because sales location and billing location can differ.

> **Common Pitfall**
>
> Snowflaking prematurely: putting `category` on a separate `dim_category` table for no benefit. Dimensions should be flat unless a hierarchy actually changes independently of the product.

---

### The analysis pattern

**Seller performance across product categories**

```sql
SELECT
    s.region,
    s.seller_id,
    p.category,
    SUM(f.net_amount) AS revenue,
    SUM(f.quantity) AS units,
    COUNT(DISTINCT f.customer_sk) AS unique_customers
FROM fact_sales f
JOIN dim_seller s ON s.seller_sk = f.seller_sk
JOIN dim_product p ON p.product_sk = f.product_sk
JOIN dim_date d ON d.date_sk = f.date_sk
WHERE d.fiscal_quarter = '2025-Q1'
GROUP BY s.region, s.seller_id, p.category
ORDER BY revenue DESC
```

---

### Trade-offs and alternatives

**Flat five-dimension star**

Conformity, evolvability, fast BI on columnar warehouses. Cost: joins at query time and dimension-maintenance pipelines.

**Snowflake with nested hierarchies**

Smaller dimension storage, independent hierarchy updates. Cost: more joins per query, more ETL surface area, and most BI tools prefer flat dimensions.

---

## Common follow-up questions

- A seller is reassigned to a new region mid-year. Do you Type 1 or Type 2 dim_seller? _(Tests SCD judgment and whether historical commissions should reflect old or new regions.)_
- How would you add a returns fact without breaking existing sales dashboards? _(Tests conformed dimension reuse and whether the candidate uses negative quantities or a separate fact.)_
- Analysts want same-day conversion rate by geography. What is missing from the fact? _(Tests whether the candidate recognizes the need for a pageviews or sessions fact rather than loading conversion into dim_customer.)_
- The business moves to a columnar warehouse. Does the star still make sense? _(Tests understanding of when to consider OBT versus preserving Kimball discipline.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/sales_analytics_star_schema)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.