Referential Integrity in DW

Concepts covered: paDataQuality

Operational databases enforce referential integrity through foreign key constraints. A row in the orders table cannot reference a customer_id that does not exist in the customers table because the database refuses to write it. Analytical pipelines do not get this protection for free. Warehouses like Snowflake and BigQuery either do not enforce foreign keys at all or treat them as informational hints. The pipeline becomes responsible for enforcing the integrity that the operational database used to enforce automatically. The cost of skipping this responsibility is orphan keys: rows in a fact table that reference dimension keys nobody has seen. The cost is not always visible. INNER JOINs silently drop orphan rows, which makes downstream metrics quietly understate. LEFT JOINs preserve the row

About This Interactive Section

This section is part of the Data Quality and Contracts: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.