Logical Data Model: Design Guide for Data Engineer...

The logical model is the contract every downstream system signs against. Pipelines, warehouses, APIs, dashboards, and ML feature stores all read from it directly or inherit its shape. Get it wrong and the damage compounds for years across every team that touches the data. Get it right and the physical layer becomes a straightforward translation job.

What Is a Logical Data Model?

A logical data model is a detailed, technology-independent blueprint of an organization's data. It defines what data exists, how it is structured, and what rules govern it, without specifying how it will be physically stored on any particular database platform.

The logical model sits between two other levels of data modeling. Above it is the conceptual model: a high-level view showing only entity names and relationships, designed for business stakeholders who do not care about attributes or data types. Below it is the physical model: a platform-specific implementation with exact column types (VARCHAR(255), BIGINT), indexes, partitions, and storage parameters.

Think of the logical model as the interface contract for the data platform. It sits above storage and below product, and it is the layer where a transactional system, a warehouse, a streaming pipeline, and an ML feature store can all agree on what a Customer is. Rename an entity here and six downstream schemas change next quarter.

Prepare for the interview
01 / Open invite
02min.

Know Data Modeling the way the interviewer who asks it knows it.

a Data Modeling query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1fact_orders
2 order_id bigint PK
3 customer_sk bigint FK
4 order_date date SCD2
5
Execute your solution0.4s avg.
SpotifyInterview question
Solve a Data Modeling problem

Three Levels of Data Modeling

Conceptual Data Model

Focus: Business concepts and high-level relationships Audience: Business stakeholders, project sponsors Detail level: Entity names and relationships only. No attributes, no data types, no keys. Example: Customer places Order. Order contains Product. Three boxes with labeled lines between them. Interview relevance: Rarely tested directly. But if an interviewer asks you to start from scratch, beginning with a conceptual model shows methodical thinking.

Logical Data Model

Focus: Entities, attributes, relationships, and constraints Audience: Data architects, data engineers, analysts Detail level: All attributes listed. Primary keys and foreign keys defined. Cardinality specified. Data types are logical (text, number, date) not platform-specific. Example: Customer (customer_id PK, name, email, city). Order (order_id PK, customer_id FK, order_date, total_amount). One customer to many orders. Interview relevance: Frequently tested. Interviewers ask you to design a logical model for a business scenario. They want to see entities, attributes, keys, and cardinality.

Physical Data Model

Focus: Platform-specific implementation details Audience: Database administrators, platform engineers Detail level: Column data types are platform-specific (VARCHAR(255), BIGINT, TIMESTAMP WITH TIME ZONE). Indexes defined. Partitioning strategy. Storage parameters. Example: customers (customer_id BIGSERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, email VARCHAR(255) UNIQUE, city VARCHAR(100)). Partition orders by order_date RANGE. Interview relevance: Sometimes tested in senior roles. The interviewer may ask you to take a logical model and make implementation decisions for a specific platform.

Components of a Logical Data Model

Entities

An entity represents a distinct business concept that the organization needs to track. Customer, Order, Product, Employee, Account, Transaction. Each entity becomes a table in the physical model. The entity name should be a singular noun that business stakeholders recognize immediately. If stakeholders do not use the term in daily conversation, it is probably not a real entity. Choosing the right entities is the first and most consequential decision in logical data modeling. Too few entities and you end up with wide tables that mix unrelated concepts (a single 'Person' table for customers, employees, and vendors). Too many entities and you fragment the model into dozens of tiny tables that require excessive joins. A mid-size e-commerce logical model typically has 15 to 25 entities. A financial services model might have 40 to 60. Interview note: When an interviewer asks you to design a logical model, start by listing entities before touching attributes. Say them out loud: 'The core entities are Customer, Order, OrderLineItem, Product, and Payment.' Then pause for feedback. This is the same pattern as declaring the grain in dimensional modeling: establish the big picture before diving into details.

Attributes

Attributes are the properties of an entity. A Customer entity has name, email, phone, city, state, country, signup_date. An Order entity has order_date, status, total_amount, shipping_address. Attributes in a logical model use logical data types (text, number, date, boolean) rather than platform-specific types (VARCHAR(255), DECIMAL(10,2)). Every attribute should pass the 'single value' test: does this attribute store exactly one value per entity instance? If a customer can have multiple phone numbers, phone_number should not be an attribute on Customer. Instead, create a CustomerPhone entity (or, in a denormalized warehouse model, store the primary phone only). The single-value test prevents the most common logical modeling mistake: cramming multi-valued data into a single column. Interview note: Interviewers pay attention to attribute granularity. Storing 'full_address' as a single text column is weaker than separating it into street, city, state, postal_code, and country. The separated version enables filtering and grouping by geography. If you combine them, an analyst cannot easily answer 'how many customers are in Texas?' without string parsing.

Relationships

Relationships define how entities connect. A Customer places many Orders (one-to-many). An Order contains many Products and a Product appears in many Orders (many-to-many, resolved through an OrderLineItem junction entity). Relationships have cardinality (one-to-one, one-to-many, many-to-many) and optionality (required or optional). Many-to-many relationships always require a junction (associative) entity in a logical model. Student enrolls in Course becomes Student -> Enrollment -> Course, where Enrollment has student_id and course_id as a composite key, plus its own attributes like enrollment_date and grade. Skipping the junction entity is a modeling error that leads to repeating groups or comma-separated values in a single column. Interview note: Draw relationships with cardinality notation (1:M, M:N) on the whiteboard. If the interviewer gives you a scenario like 'a doctor can see many patients, and a patient can see many doctors,' they are testing whether you create the junction entity (Appointment) with its own attributes (appointment_date, diagnosis, duration_minutes). The junction entity often has the most interesting attributes in the model.

Constraints

Constraints enforce data integrity rules in the logical model. Primary key constraints ensure every entity instance is uniquely identifiable. Foreign key constraints ensure relationships reference valid entities. Uniqueness constraints prevent duplicates (email must be unique per customer). NOT NULL constraints identify required attributes. Check constraints enforce value ranges (order_total >= 0). In a logical model, constraints are expressed in business terms, not SQL syntax. 'Every order must reference exactly one customer' (foreign key + NOT NULL). 'No two products can share the same SKU' (unique constraint). 'Account balance cannot be negative' (check constraint). These rules document business requirements that the physical model must enforce. Missing constraints in the logical model almost always become data quality bugs in production. Interview note: Mentioning constraints unprompted shows the interviewer you think about data quality at the design stage, not as an afterthought. When listing attributes, note which ones are required (NOT NULL) and which have uniqueness requirements. This takes 10 seconds and signals experience with real production systems where missing constraints cause silent data corruption.

How to Build a Logical Data Model

  1. 01

    Identify Business Processes and Stakeholders

    Start with the business processes the model must support. An e-commerce platform needs to support order placement, fulfillment, returns, and customer management. A SaaS product needs to support user signups, subscription management, feature usage tracking, and billing. Each process generates entities. List the processes first, then extract the entities that participate in each one.

  2. 02

    Define Entities and Their Boundaries

    Name each entity as a singular noun. Decide what each entity represents at the instance level. A Customer is one person or company with one account. An Order is one purchase event. An OrderLineItem is one product within one order. Be explicit about boundaries: does a 'Product' represent a specific item (serial number level) or a product type (SKU level)? This decision affects every downstream attribute and relationship.

  3. 03

    List Attributes and Assign Logical Types

    For each entity, list every attribute the business needs to track. Use logical types: text, integer, decimal, date, datetime, boolean. Mark required attributes (NOT NULL) and unique attributes. Apply the single-value test to each attribute. If an attribute can have multiple values per entity instance, it needs its own entity or a junction table.

  4. 04

    Define Relationships and Cardinality

    Connect entities with relationships. Specify cardinality: one-to-one, one-to-many, or many-to-many. Resolve many-to-many relationships into junction entities. Specify optionality: can an Order exist without a Customer? (No. Required relationship.) Can a Customer exist without any Orders? (Yes. Optional relationship.) Draw the relationships on the diagram with cardinality notation.

  5. 05

    Validate Against Business Questions

    Test the logical model against real business questions. 'What is the total revenue per product category last quarter?' Can the model answer this? 'Which customers have not placed an order in 90 days?' Can the model identify them? 'What is the average time between order placement and shipment?' Does the model capture both dates? If a business question cannot be answered by the model, an entity or attribute is missing. This validation step catches gaps before the physical implementation begins.

4 Logical Data Model Interview Questions

What is a logical data model and how does it differ from a physical data model?

A logical data model defines entities, attributes, relationships, and constraints using logical data types (text, number, date) without any platform-specific implementation details. A physical data model translates the logical model into platform-specific structures: VARCHAR(255) instead of text, BIGINT instead of number, plus indexes, partitions, and storage parameters. The logical model is technology-agnostic and serves as a communication tool between business and technical teams. The physical model is tied to a specific database platform.

Walk me through designing a logical data model for a hotel booking system.

Start with entities: Hotel, Room, Guest, Booking, Payment, RoomType. Define key attributes: Hotel (hotel_id, name, city, star_rating), Room (room_id, hotel_id FK, room_type_id FK, floor, room_number), Guest (guest_id, name, email, phone, loyalty_tier), Booking (booking_id, guest_id FK, room_id FK, check_in_date, check_out_date, status, total_amount), Payment (payment_id, booking_id FK, amount, payment_method, payment_date), RoomType (room_type_id, name, base_rate, max_occupancy). Relationships: Hotel 1:M Room. RoomType 1:M Room. Guest 1:M Booking. Room 1:M Booking. Booking 1:M Payment. Validate: can we answer 'average occupancy rate per hotel per month'? Yes. Can we answer 'revenue by room type'? Yes.

When would you skip the logical model and go straight to physical?

For quick prototypes, small analytics tables, or throwaway ETL jobs, going straight to physical is fine. But for any system that will be maintained by a team, serve multiple consumers, or operate in a regulated industry, skipping the logical model is a mistake. The logical model forces you to agree on entity definitions, relationship cardinality, and data quality constraints before committing to a platform. Changing an entity boundary after 6 months of data accumulation is 10x harder than getting it right in the logical model stage.

How do you handle a many-to-many relationship in a logical data model?

Create a junction (associative) entity. Student and Course have a many-to-many relationship. The junction entity is Enrollment, with student_id and course_id as a composite key. The junction entity almost always has its own attributes: enrollment_date, grade, status, dropped_date. These attributes belong to the relationship itself, not to either participating entity. If the junction entity has no attributes beyond the two foreign keys, it is still correct to create it because many-to-many relationships cannot be directly represented in a relational schema without one. Name the junction entity with a business-meaningful name (Enrollment, not StudentCourse) whenever possible.

Frequently Asked Questions

What is a logical data model?+
A logical data model is a technology-independent representation of an organization's data. It defines entities (things the business tracks), attributes (properties of those things), relationships (how things connect), and constraints (rules the data must follow). It uses logical data types like text, number, and date rather than platform-specific types like VARCHAR or BIGINT. The logical model sits between the conceptual model (high-level business concepts) and the physical model (platform-specific implementation).
What is the difference between conceptual, logical, and physical data models?+
Conceptual models show entities and relationships with no attributes or data types. They communicate with business stakeholders. Logical models add all attributes, primary and foreign keys, cardinality, and constraints using logical data types. They serve as the detailed blueprint agreed upon by business and technical teams. Physical models translate the logical model into platform-specific structures: exact column types, indexes, partitions, and storage settings. Each level adds implementation detail while the business meaning stays constant.
Who creates a logical data model?+
Data architects, senior data engineers, or analytics engineers typically create logical data models. The process involves collaboration with business stakeholders (to validate entity definitions and business rules), application developers (to understand data sources), and data analysts (to ensure the model supports analytical queries). In smaller teams, a senior data engineer may handle all three modeling levels. In larger organizations, a dedicated data architect owns the logical model.
Do I need a logical data model for a data warehouse?+
For a production data warehouse that serves multiple teams, yes. The logical model forces alignment on entity definitions (what is a 'customer'?), relationship cardinality (can an order have multiple shipping addresses?), and data quality constraints (is email required?). Without this alignment, each team builds its own definitions, leading to inconsistent metrics. For a personal analytics project or a prototype, you can skip it and go straight to the physical model. The cost of skipping increases with team size and data complexity.
02 / Why practice

Design the Contract Every Pipeline Reads

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition

Related Guides