Data Model: Types and Why It Matters

The word 'model' means three different things: a conceptual sketch for alignment, a logical blueprint for design, and a physical implementation for the database. Understanding which level you are working at, and when to move between them, is what separates engineers who design schemas from engineers who just write tables.

18%

DE Interviews: Modeling

191

Modeling Qs Analyzed

1,418

Interactive Challenges

Onsite Modeling Rounds

What Is a Data Model?

If you are new to data modeling, start here. The word 'model' means three different things depending on who is talking. A conceptual model is a sketch for a product manager. A logical model is the entity contract your engineering team argues over. A physical model is what actually runs on the database. You will need all three eventually.

Three Types of Data Models

Conceptual Data Model

A conceptual data model defines the high-level entities and relationships in a system without any implementation detail. It answers 'what entities exist and how do they relate?' An e-commerce system has Customers, Orders, Products, and Payments. A Customer places an Order. An Order contains Products. A Payment settles an Order. No data types, no primary keys, no table names. Just entities and relationships. Audience: Business stakeholders, product managers, and architects use conceptual models to align on scope before anyone writes DDL. Interview note: Interviewers sometimes hand you a business scenario and ask you to 'draw the data model.' Start here. Sketch entities as boxes and relationships as lines. This shows structured thinking before diving into table definitions.

Logical Data Model

A logical data model adds structure to the conceptual model. It defines attributes (columns), primary keys, foreign keys, and data types at an abstract level. It is database-agnostic: no indexes, no partitions, no engine-specific syntax. The logical model for an Order might include order_id (PK), customer_id (FK), order_date, and total_amount. It specifies that order_id is an integer and order_date is a date, but it does not specify whether the table is partitioned or what index strategy to use. Audience: Data architects and senior engineers use logical models to communicate schema design without committing to a specific database engine. Interview note: When an interviewer asks you to 'design a schema,' a logical data model is usually the expected deliverable. Define entities, attributes, keys, and relationships. Mention data types but skip physical details unless asked.

Physical Data Model

A physical data model is the actual implementation in a specific database engine. It includes CREATE TABLE statements, index definitions, partitioning strategies, storage formats, and engine-specific features. The physical model for an Order in Snowflake looks different from the same Order in PostgreSQL. Snowflake uses clustering keys instead of indexes. PostgreSQL uses B-tree indexes and table partitioning. The logical model is the same; the physical model adapts to the engine. Audience: Data engineers build physical models. You write the DDL, choose the partitioning strategy, define the indexes, and optimize for the query workload. Interview note: Physical model questions come up in senior interviews. The interviewer wants to hear about partitioning (fact tables by date), clustering (Snowflake cluster keys on high-cardinality columns), and storage formats (Parquet vs Delta vs Iceberg). Show that you think about how the model performs, not just what it contains.

Why Data Models Matter

Communication

A data model is a shared language between engineers, analysts, and business stakeholders. When everyone agrees that a 'customer' entity has these attributes and relates to 'orders' this way, there is less ambiguity in requirements, reports, and pipeline logic. Without a shared model, the marketing team's definition of 'customer' might differ from the finance team's, and your pipeline produces numbers that nobody trusts.

Schema Design

Data models are the blueprint for your database schemas. Whether you are building a normalized OLTP schema, a star schema for analytics, or a wide table for a feature store, the data model defines the structure. Starting from a model prevents ad-hoc table creation that leads to duplicated data, inconsistent naming, and queries that require 15 joins because nobody planned the relationships.

Pipeline Contracts

Your data model defines the contract between upstream sources and downstream consumers. When an analyst builds a dashboard on top of fact_orders, they depend on the model having specific columns, data types, and grain. When the source system changes, the pipeline needs to map the change back to the model. Without an explicit model, every pipeline change risks breaking downstream consumers in ways nobody anticipated.

Interview Performance

Data modeling questions appear in roughly 40% of data engineering interview loops. The interviewer hands you a business scenario and expects a data model. Candidates who start with entities and relationships, then move to attributes and keys, then discuss physical trade-offs demonstrate the structured thinking that hiring teams look for. Jumping straight to column names without a model signals inexperience.

Common Data Model Patterns

Star Schema

A central fact table surrounded by denormalized dimension tables. The most common pattern for analytics warehouses. Optimized for read-heavy workloads with few joins.

Snowflake Schema

Similar to a star schema but with normalized dimension tables. Dimensions are split into sub-tables linked by foreign keys. Uses less storage but requires more joins.

Data Vault

A modeling methodology for enterprise data warehouses. Uses Hubs (business keys), Links (relationships), and Satellites (descriptive attributes). Designed for auditability and parallel loading.

Wide / Flat Table

A single denormalized table with many columns. Common in data lakes and feature stores where query simplicity and scan performance matter more than storage efficiency. Often materialized from a star schema for specific use cases.

Normalized (3NF)

Tables decomposed to eliminate redundancy. Each non-key attribute depends on the key, the whole key, and nothing but the key. The standard for OLTP systems. Data engineers extract from 3NF sources and transform into star schemas or wide tables.

How to Build a Data Model

01
Identify Business Entities
List the core things the business cares about: customers, orders, products, events, employees. These become your entities. Do not start with columns or tables. Start with what the business tracks.
02
Define Relationships
How do entities connect? A customer places orders (one-to-many). An order contains products (many-to-many through a junction table). A product belongs to a category (many-to-one). Cardinality matters: one-to-one, one-to-many, and many-to-many each have different physical implementations.
03
Add Attributes and Keys
For each entity, list the attributes. Identify the primary key (natural or surrogate) and foreign keys. Define data types at the logical level. This is the logical data model. It should be database-agnostic at this point.
04
Choose a Physical Pattern
Based on the workload (OLTP vs OLAP), choose a physical pattern. OLTP: normalized 3NF. OLAP: star schema or wide table. Data lake: Parquet with partition keys. The logical model stays the same; the physical implementation adapts to the engine and query patterns.
05
Implement and Validate
Write the DDL, load sample data, run representative queries, and validate that the model supports the required analytics. Check query performance. Verify that joins produce correct results. Ensure the grain is correct by comparing row counts against expectations.

Frequently Asked Questions

What is a data model?+

A data model is a structured representation of how data is organized, stored, and related within a system. It defines the entities (things like customers, orders, products), their attributes (columns like name, date, amount), and the relationships between them (a customer places orders). Data models exist at three levels: conceptual (high-level entities and relationships), logical (attributes, keys, and data types without implementation details), and physical (actual database tables, indexes, and partitions).

What are the three types of data models?+

The three types are conceptual, logical, and physical. A conceptual model shows entities and relationships without detail (used for business alignment). A logical model adds attributes, keys, and data types without specifying a database engine (used for design). A physical model is the actual implementation with CREATE TABLE statements, indexes, and engine-specific features (used for building). Each level adds more detail while preserving the structure of the level above.

Why do data engineers need data models?+

Data engineers build the schemas that analysts query, the pipelines that move data, and the transformations that shape it. Without a data model, you get ad-hoc table creation, inconsistent naming, duplicated data, and pipelines that break when source schemas change. A data model is the blueprint that keeps everything coherent. It also appears in 40% of DE interview loops, so understanding modeling methodology is directly tied to getting hired.

What is the difference between a data model and a schema?+

A data model is the design. A schema is the implementation. The data model defines entities, relationships, attributes, and keys at a conceptual or logical level. The schema is the physical realization: CREATE TABLE statements, indexes, constraints, and partitions in a specific database. You can have one logical data model that produces different physical schemas in PostgreSQL, Snowflake, and BigQuery. The model is the blueprint; the schema is the building.

02 / Why practice

You'll Get There Faster By Drawing Than By Reading

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
The round is won on tradeoffs, not on the diagram
Grain, star vs snowflake, SCD type, conformed dimensions, late-arriving data. Modeling under live pushback is what separates the bands, and it is the half almost nobody rehearses

Start Practicing

Related Guides

Star Schema→

Fact tables, dimension tables, and the standard analytics warehouse pattern

Dimensional Modeling→

Kimball methodology, fact types, and the 4-step design process

Data Modeling Interview Questions→

Every data modeling topic interviewers test