Data Model: Types and Why It Matters
The word 'model' means three different things: a conceptual sketch for alignment, a logical blueprint for design, and a physical implementation for the database. Understanding which level you are working at, and when to move between them, is what separates engineers who design schemas from engineers who just write tables.
What Is a Data Model?
If you are new to data modeling, start here. The word 'model' means three different things depending on who is talking. A conceptual model is a sketch for a product manager. A logical model is the entity contract your engineering team argues over. A physical model is what actually runs on the database. You will need all three eventually.
Three Types of Data Models
Conceptual Data Model
A conceptual data model defines the high-level entities and relationships in a system without any implementation detail. It answers 'what entities exist and how do they relate?' An e-commerce system has Customers, Orders, Products, and Payments. A Customer places an Order. An Order contains Products. A Payment settles an Order. No data types, no primary keys, no table names. Just entities and relationships. Audience: Business stakeholders, product managers, and architects use conceptual models to align on scope before anyone writes DDL. Interview note: Interviewers sometimes hand you a business scenario and ask you to 'draw the data model.' Start here. Sketch entities as boxes and relationships as lines. This shows structured thinking before diving into table definitions.
Logical Data Model
A logical data model adds structure to the conceptual model. It defines attributes (columns), primary keys, foreign keys, and data types at an abstract level. It is database-agnostic: no indexes, no partitions, no engine-specific syntax. The logical model for an Order might include order_id (PK), customer_id (FK), order_date, and total_amount. It specifies that order_id is an integer and order_date is a date, but it does not specify whether the table is partitioned or what index strategy to use. Audience: Data architects and senior engineers use logical models to communicate schema design without committing to a specific database engine. Interview note: When an interviewer asks you to 'design a schema,' a logical data model is usually the expected deliverable. Define entities, attributes, keys, and relationships. Mention data types but skip physical details unless asked.
Physical Data Model
A physical data model is the actual implementation in a specific database engine. It includes CREATE TABLE statements, index definitions, partitioning strategies, storage formats, and engine-specific features. The physical model for an Order in Snowflake looks different from the same Order in PostgreSQL. Snowflake uses clustering keys instead of indexes. PostgreSQL uses B-tree indexes and table partitioning. The logical model is the same; the physical model adapts to the engine. Audience: Data engineers build physical models. You write the DDL, choose the partitioning strategy, define the indexes, and optimize for the query workload. Interview note: Physical model questions come up in senior interviews. The interviewer wants to hear about partitioning (fact tables by date), clustering (Snowflake cluster keys on high-cardinality columns), and storage formats (Parquet vs Delta vs Iceberg). Show that you think about how the model performs, not just what it contains.
Why Data Models Matter
Communication
A data model is a shared language between engineers, analysts, and business stakeholders. When everyone agrees that a 'customer' entity has these attributes and relates to 'orders' this way, there is less ambiguity in requirements, reports, and pipeline logic. Without a shared model, the marketing team's definition of 'customer' might differ from the finance team's, and your pipeline produces numbers that nobody trusts.
Schema Design
Data models are the blueprint for your database schemas. Whether you are building a normalized OLTP schema, a star schema for analytics, or a wide table for a feature store, the data model defines the structure. Starting from a model prevents ad-hoc table creation that leads to duplicated data, inconsistent naming, and queries that require 15 joins because nobody planned the relationships.
Pipeline Contracts
Your data model defines the contract between upstream sources and downstream consumers. When an analyst builds a dashboard on top of fact_orders, they depend on the model having specific columns, data types, and grain. When the source system changes, the pipeline needs to map the change back to the model. Without an explicit model, every pipeline change risks breaking downstream consumers in ways nobody anticipated.
Interview Performance
Data modeling questions appear in roughly 40% of data engineering interview loops. The interviewer hands you a business scenario and expects a data model. Candidates who start with entities and relationships, then move to attributes and keys, then discuss physical trade-offs demonstrate the structured thinking that hiring teams look for. Jumping straight to column names without a model signals inexperience.
Common Data Model Patterns
Star Schema
A central fact table surrounded by denormalized dimension tables. The most common pattern for analytics warehouses. Optimized for read-heavy workloads with few joins.
Snowflake Schema
Similar to a star schema but with normalized dimension tables. Dimensions are split into sub-tables linked by foreign keys. Uses less storage but requires more joins.
Data Vault
A modeling methodology for enterprise data warehouses. Uses Hubs (business keys), Links (relationships), and Satellites (descriptive attributes). Designed for auditability and parallel loading.
Wide / Flat Table
A single denormalized table with many columns. Common in data lakes and feature stores where query simplicity and scan performance matter more than storage efficiency. Often materialized from a star schema for specific use cases.
Normalized (3NF)
Tables decomposed to eliminate redundancy. Each non-key attribute depends on the key, the whole key, and nothing but the key. The standard for OLTP systems. Data engineers extract from 3NF sources and transform into star schemas or wide tables.
How to Build a Data Model
- 01
Identify Business Entities
List the core things the business cares about: customers, orders, products, events, employees. These become your entities. Do not start with columns or tables. Start with what the business tracks.
- 02
Define Relationships
How do entities connect? A customer places orders (one-to-many). An order contains products (many-to-many through a junction table). A product belongs to a category (many-to-one). Cardinality matters: one-to-one, one-to-many, and many-to-many each have different physical implementations.
- 03
Add Attributes and Keys
For each entity, list the attributes. Identify the primary key (natural or surrogate) and foreign keys. Define data types at the logical level. This is the logical data model. It should be database-agnostic at this point.
- 04
Choose a Physical Pattern
Based on the workload (OLTP vs OLAP), choose a physical pattern. OLTP: normalized 3NF. OLAP: star schema or wide table. Data lake: Parquet with partition keys. The logical model stays the same; the physical implementation adapts to the engine and query patterns.
- 05
Implement and Validate
Write the DDL, load sample data, run representative queries, and validate that the model supports the required analytics. Check query performance. Verify that joins produce correct results. Ensure the grain is correct by comparing row counts against expectations.
Frequently Asked Questions
What is a data model?+
What are the three types of data models?+
Why do data engineers need data models?+
What is the difference between a data model and a schema?+
You'll Get There Faster By Drawing Than By Reading
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition