# The Carrier Moving to Azure

> Claims arrive messy. The medallion cleans them up.

Canonical URL: <https://datadriven.io/problems/the_carrier_moving_to_azure>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

We are an insurance carrier migrating our claims and policy data platform to Azure Databricks. We have three source systems feeding claims, policy, and premium data into our warehouse, and we need a governed medallion architecture with proper access controls for actuarial, finance, and regulatory teams. Design the platform architecture including how you would configure the storage layers and enforce access policies.

## Worked solution and explanation

### Why this problem exists in real interviews

Insurance is the worst-case test for a multi-consumer governed warehouse: each team sees a different slice, freshness ranges from minutes (claims dashboard) to days (actuarial), and statutory filings have to stay unchanged after they go out. The trap is over-rotating on one of these and breaking the others. Pure streaming pays too much for actuarial; pure batch misses claims; a shared gold table without per-team access fails the access requirement.

The default shape is one nightly load into a single gold table that every team queries with their own filters. Actuarial's loss-ratio dashboards work; finance reconciles late because the load is sized for actuarial; the claims dashboard is hours stale; an adjuster's query that forgets the queue filter returns another adjuster's claims. A statutory filing goes out, a correction lands the next week and the underlying gold table is updated, and the regulator's next inquiry sees a different number than what was filed.

> **Trick to Solving**
>
> Per-team gold tables with row and column policies, multiple cadences off one source, statutory filings written to immutable storage with amendments alongside.
> 
> 1. A streaming path serves the claims dashboard and high-severity alerts in minutes; a batch path lands the warehouse-grade gold for actuarial and finance on a slower cadence.
> 2. Per-team gold layers (or per-team views over a shared silver) carry row and column policies enforced at the warehouse: actuarial reads loss ratios without claimant identifiers, finance reads premium aggregates without claim detail.
> 3. Statutory filings write to an immutable archive; corrections are amendments linked to the original record, not overwrites of the live warehouse.

---

### Walk the requirements

#### Step 1: Per-team row and column policies enforced at the warehouse

Each team's view exposes only the columns and rows they're allowed to see. Actuarial reads loss ratios but the claimant identifier column is masked; finance reads premium aggregates without claim detail; regulators read statutory fields; adjusters see only their own queue, enforced by a row-level policy on the user. The platform owns the policy regardless of how the query is written. Without a governed warehouse / lakehouse there's nowhere for the policies to live.

#### Step 2: Three freshness targets off one source

The claims dashboard reads from a streaming-fed store within minutes; high-severity claims trigger an alert through the same streaming path; finance and actuarial read warehouse-grade gold tables on a daily cadence. One source, two paths, three consumer cadences. Forcing actuarial onto the streaming tier wastes compute; forcing the claims dashboard onto the daily batch misses its named freshness target.

#### Step 3: Statutory filings immutable; corrections are amendments

Each filing writes to an immutable archive at filing time; once written, no overwrite. A post-filing correction is a new amendment record alongside the original, both queryable, with the link between them recorded. The regulator's next inquiry sees what was filed and what was amended, separately. A 'we'll update the warehouse and refile' approach is what makes regulators escalate; the live warehouse can change, the archive cannot.

---

### The shape that fits

> **What this design gives up**
>
> Per-team gold layers (or row/column policies over a shared silver) is more configuration than one big table. Two paths off one source means two pieces of operational machinery instead of one. An immutable filing archive doubles storage for filed periods. Warehouse simplicity is the cost; in return, an access boundary that survives an audit, freshness that fits each team, and filings that don't quietly change.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A streaming path serves claims dashboards and high-severity alerts within minutes; a batch path builds the gold warehouse for actuarial and finance.
> - Per-team row and column policies are enforced at the warehouse regardless of how the query is written.
> - Filed statutory snapshots write to immutable storage; corrections are amendments alongside the original.

> **The mistake that ships**
>
> The team's first cut uses a single nightly load into one shared gold table with per-team filters in the BI tool. The claims dashboard is stale, finance is late, and an adjuster's query without the queue filter returns another adjuster's claims. A statutory filing goes out, a correction lands the next week and the warehouse is updated; the regulator's next inquiry sees a different number than was filed. The team rebuilds with two paths, per-team policies, and an immutable filing archive. The regulator's amended-filing question is the one that costs the most: it's not a code change, it's an audit response.

---

## Common follow-up questions

- An adjuster moves to a supervisor role and needs to see her team's queue, not just her own. What changes in the design, and where? _(Tests whether the candidate sees row-level security as a property of the role: changing the role flips the row filter to 'team queue' instead of 'own queue.' No query rewrites, no exports to redistribute.)_
- A high-severity claim arrives, the streaming path is degraded, and the alert is delayed. What does this design surface, and what does it not? _(Tests whether the candidate has a degraded-mode plan: lag alerts on the streaming path fire before customer-facing impact; the dashboard reads from the streaming store and shows staleness; the high-severity alert path either keeps trying or routes to a backup notification channel until the streaming path recovers.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_carrier_moving_to_azure)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.