DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Schema Evolution

Schema Evolution

Schemas change constantly; pipelines that break on a new column are not production-ready

Schemas change constantly; pipelines that break on a new column are not production-ready

Category
Pipeline Architecture
Difficulty
advanced
Duration
25 minutes
Challenges
0 hands-on challenges

Topics covered: "What Happens When the Source Adds a Column?", Forward and Backward Compatibility, Schema Registries and Contracts, Handling Breaking Changes, Schema Evolution in Table Formats

Lesson Sections

  1. "What Happens When the Source Adds a Column?"

    What They're Really Testing The Unlock Schema evolution is not about handling every possible change. It is about defining which changes are safe (non-breaking) and which require coordination (breaking). A new nullable column is safe. A type change from string to integer is breaking. The pipeline should handle the first automatically and alert on the second. The 60-Second Framework Why Companies Care At Spotify, a Protobuf field type change from int32 to int64 in the listening events schema broke

  2. Forward and Backward Compatibility

    Compatibility defines which schema changes are safe. The interview tests whether you know the four compatibility modes and can apply them to a specific scenario. The Four Compatibility Modes Safe vs Breaking Changes Format Comparison

  3. Schema Registries and Contracts

    A schema registry is the enforcement layer. Without it, compatibility rules are documented but unenforced. The producer publishes a new schema, the registry checks compatibility, and rejects the change if it would break consumers. This catches breaking changes at publish time, not at 3 AM when the pipeline crashes. How It Works Schema Contracts: The Organizational Layer A schema registry enforces technical compatibility. A schema contract enforces organizational agreements. 'Team A guarantees th

  4. Handling Breaking Changes

    Non-breaking changes are handled automatically. Breaking changes require a migration strategy. The interview tests whether you have one. The Expand-Contract Pattern This is the zero-downtime migration pattern. At no point are producers and consumers on incompatible schemas. The cost is a temporary period of schema duplication where both old and new fields coexist. Other Strategies for Breaking Changes Red Flag Phrases

  5. Schema Evolution in Table Formats

    Modern table formats (Iceberg, Delta Lake, Hudi) handle schema evolution natively, eliminating many of the problems that older Hive/Parquet-based pipelines suffer from. Knowing how each format handles evolution is a differentiator in interviews at companies using lakehouse architectures. Table Format Comparison Iceberg uses column IDs (integers) instead of column names for field matching. This means renaming a column in Iceberg is a metadata-only operation that does not require rewriting any dat

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges