# Telecom Network Connectivity Warehouse

> One device goes down. The ripple keeps going.

Canonical URL: <https://datadriven.io/problems/telecom_network_connectivity_warehouse>

Domain: Data Modeling · Difficulty: hard · Seniority: L6

## Problem

We're a telecom provider building a new data warehouse. We need to model our network infrastructure: towers, switches, fiber links, and the connections between them. We also need to track outages and capacity utilization. Can you design this?

## Worked solution and explanation

### Why this problem exists in real interviews

Telecom modeling probes whether a candidate can handle a self-referential topology (elements connect to elements) alongside a high-cardinality time-series and a lower-cardinality incident log. The signal is whether the raw utilization samples and the rolled-up daily fact coexist deliberately, with a clear justification for the pre-aggregation.

> **Trick to Solving**
>
> The tell is "network connectivity". Any network is a graph, and any graph demands a self-referential junction, not two copies of the node table. Before drawing any tables, a strong candidate asks: what is the finest utilization granularity, and what is the query frequency at that grain? Raw samples at 5-minute resolution plus a daily roll-up is the standard answer.
> 
> 1. Model network_elements as nodes with a self-reference via connections
> 2. Keep utilization_raw at sample grain and utilization_daily pre-aggregated
> 3. Use outage_affected_elements as a junction for the many-to-many
> 4. Anchor everything to conformed regions

---

### Break down the requirements

#### Step 1: Nodes and edges

`network_elements` is the node table. `connections` is the edge table with `source_element_id` and `target_element_id`. This is the canonical graph-in-SQL shape.

#### Step 2: Time-series at two grains

`utilization_raw` holds per-sample readings for forensic investigation. `utilization_daily` is a pre-aggregated snapshot for dashboards and capacity planning. Both are rebuildable from the raw table.

#### Step 3: Outages as incidents, not mutations

An outage is its own entity with a lifecycle (`started_at`, `ended_at`). `outage_affected_elements` is the junction that captures which elements were affected and by how much.

#### Step 4: Regions as the conforming dimension

`regions` is shared by elements and outages. This is what lets "minutes of downtime per region per month" be a one-query answer.

#### Step 5: Indexing strategy matters here

`utilization_raw` needs a composite index on (`element_id`, `sampled_at`) for time-range scans. `outage_affected_elements` needs indexes in both directions for "outages per element" and "elements per outage".

---

### The solution

Below is one defensible model. The split between raw and daily utilization is the anchor; it is the pre-aggregation decision that makes this model scale.

> **Why This Design Works**
>
> Graph topology plus dual-grain telemetry plus incident-as-entity gives operators three independent query patterns from one coherent model: topology traversal for blast radius, daily trends for capacity planning, and raw samples for forensic debugging. The cost is storage and ETL complexity, both justified by the operational stakes.

> **Interviewers Watch For**
>
> Strong candidates explicitly treat connections as a graph edge table and call out the need for directionality in link_type. They also justify the daily roll-up with a storage-versus-query-frequency argument. Weaker candidates put utilization columns on network_elements and lose the ability to look back.

> **Common Pitfall**
>
> Mutating element rows to reflect current utilization. This destroys history and makes any SLA report impossible to reconstruct after an incident.

---

### The analysis pattern

**Region downtime minutes with element impact**

```sql
SELECT
    r.region_name,
    DATE_TRUNC('month', o.started_at) AS month,
    SUM(EXTRACT(EPOCH FROM (o.ended_at - o.started_at)) / 60 * oae.impact_pct / 100) AS weighted_downtime_minutes,
    COUNT(DISTINCT o.outage_id) AS incidents,
    AVG(ud.peak_utilization_pct) AS avg_peak_utilization
FROM outages o
JOIN outage_affected_elements oae ON oae.outage_id = o.outage_id
JOIN network_elements e ON e.element_id = oae.element_id
JOIN regions r ON r.region_id = e.region_id
LEFT JOIN utilization_daily ud
    ON ud.element_id = e.element_id
   AND ud.usage_date = o.started_at::date
WHERE o.started_at >= '2025-01-01'
GROUP BY r.region_name, month
ORDER BY weighted_downtime_minutes DESC
```

---

### Trade-offs and alternatives

**Graph topology with dual-grain telemetry**

Three clean query patterns, idempotent daily rebuild, forensic raw data available. Cost: storage footprint for raw samples, ETL pipelines for the daily roll-up, and careful indexing on hot tables.

**Graph database plus time-series store**

Native topology traversal in Neo4j, optimized time-series ingestion in InfluxDB or Timescale. Cost: two storage systems to operate, cross-system joins for incident analysis, and two query languages.

---

## Common follow-up questions

- Elements form a directed graph with redundant paths. How do you compute blast radius for an outage? _(Tests recursive CTEs over connections and whether the candidate proposes a separate topology snapshot.)_
- Samples arrive out of order from 100 regional collectors. How does utilization_raw handle that? _(Tests event time versus ingestion time and idempotent upsert by (element_id, sampled_at).)_
- Daily utilization for the last 24 months must be retained, but raw samples only for 90 days. How is retention enforced? _(Tests partitioning by sampled_at with a cold-storage drop policy and daily rebuild before raw data ages out.)_
- A new element type has a 10x sample rate. How does the schema absorb the cardinality change? _(Tests whether the candidate proposes partitioning by element_type or a separate fact for the new grain.)_
- A cross-region failover must be simulated. What does the model need that it does not have? _(Tests whether a time-dimensioned connections table (effective-dated edges) is warranted.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/telecom_network_connectivity_warehouse)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.