Data Modeling

Logical Data Model: Design Guide for Data Engineer Interviews

The logical model is the contract every downstream system signs against. Pipelines, warehouses, APIs, dashboards, and ML feature stores all read from it directly or inherit its shape. Get it wrong and the damage compounds for years across every team that touches the data. Get it right and the physical layer becomes a straightforward translation job.

Interviewers at the L5 and L6 level give you a business scenario and watch you draw the entity boundaries. What they're really testing: can you think about the system before you think about the table?

18%

DE Interviews: Modeling

61%

Rounds at L5 (Senior)

49

Onsite Modeling Rounds

191

Modeling Qs Analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

What Is a Logical Data Model?

A logical data model is a detailed, technology-independent blueprint of an organization's data. It defines what data exists, how it is structured, and what rules govern it, without specifying how it will be physically stored on any particular database platform.

The logical model sits between two other levels of data modeling. Above it is the conceptual model: a high-level view showing only entity names and relationships, designed for business stakeholders who do not care about attributes or data types. Below it is the physical model: a platform-specific implementation with exact column types (VARCHAR(255), BIGINT), indexes, partitions, and storage parameters.

Think of the logical model as the interface contract for the data platform. It sits above storage and below product, and it's the layer where a transactional system, a warehouse, a streaming pipeline, and an ML feature store can all agree on what a Customer is. If you only see it as a Visio diagram, you're missing the point. It's the blueprint every other subsystem derives from. Rename an entity here and six downstream schemas change next quarter.

Three Levels of Data Modeling

Conceptual Data Model

Focus: Business concepts and high-level relationships

Audience: Business stakeholders, project sponsors

Detail level: Entity names and relationships only. No attributes, no data types, no keys.

Example: Customer places Order. Order contains Product. Three boxes with labeled lines between them.

Interview relevance

Rarely tested directly. But if an interviewer asks you to start from scratch, beginning with a conceptual model shows methodical thinking.

Logical Data Model

Focus: Entities, attributes, relationships, and constraints

Audience: Data architects, data engineers, analysts

Detail level: All attributes listed. Primary keys and foreign keys defined. Cardinality specified. Data types are logical (text, number, date) not platform-specific.

Example: Customer (customer_id PK, name, email, city). Order (order_id PK, customer_id FK, order_date, total_amount). One customer to many orders.

Interview relevance

Frequently tested. Interviewers ask you to design a logical model for a business scenario. They want to see entities, attributes, keys, and cardinality.

Physical Data Model

Focus: Platform-specific implementation details

Audience: Database administrators, platform engineers

Detail level: Column data types are platform-specific (VARCHAR(255), BIGINT, TIMESTAMP WITH TIME ZONE). Indexes defined. Partitioning strategy. Storage parameters.

Example: customers (customer_id BIGSERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, email VARCHAR(255) UNIQUE, city VARCHAR(100)). Partition orders by order_date RANGE.

Interview relevance

Sometimes tested in senior roles. The interviewer may ask you to take a logical model and make implementation decisions for a specific platform.

Components of a Logical Data Model

Four building blocks make up every logical data model. Interviewers expect you to address all four when designing a model on a whiteboard.

Entities

An entity represents a distinct business concept that the organization needs to track. Customer, Order, Product, Employee, Account, Transaction. Each entity becomes a table in the physical model. The entity name should be a singular noun that business stakeholders recognize immediately. If stakeholders do not use the term in daily conversation, it is probably not a real entity.

Choosing the right entities is the first and most consequential decision in logical data modeling. Too few entities and you end up with wide tables that mix unrelated concepts (a single 'Person' table for customers, employees, and vendors). Too many entities and you fragment the model into dozens of tiny tables that require excessive joins. A mid-size e-commerce logical model typically has 15 to 25 entities. A financial services model might have 40 to 60.

Interview note

When an interviewer asks you to design a logical model, start by listing entities before touching attributes. Say them out loud: 'The core entities are Customer, Order, OrderLineItem, Product, and Payment.' Then pause for feedback. This is the same pattern as declaring the grain in dimensional modeling: establish the big picture before diving into details.

Attributes

Attributes are the properties of an entity. A Customer entity has name, email, phone, city, state, country, signup_date. An Order entity has order_date, status, total_amount, shipping_address. Attributes in a logical model use logical data types (text, number, date, boolean) rather than platform-specific types (VARCHAR(255), DECIMAL(10,2)).

Every attribute should pass the 'single value' test: does this attribute store exactly one value per entity instance? If a customer can have multiple phone numbers, phone_number should not be an attribute on Customer. Instead, create a CustomerPhone entity (or, in a denormalized warehouse model, store the primary phone only). The single-value test prevents the most common logical modeling mistake: cramming multi-valued data into a single column.

Interview note

Interviewers pay attention to attribute granularity. Storing 'full_address' as a single text column is weaker than separating it into street, city, state, postal_code, and country. The separated version enables filtering and grouping by geography. If you combine them, an analyst cannot easily answer 'how many customers are in Texas?' without string parsing.

Relationships

Relationships define how entities connect. A Customer places many Orders (one-to-many). An Order contains many Products and a Product appears in many Orders (many-to-many, resolved through an OrderLineItem junction entity). Relationships have cardinality (one-to-one, one-to-many, many-to-many) and optionality (required or optional).

Many-to-many relationships always require a junction (associative) entity in a logical model. Student enrolls in Course becomes Student -> Enrollment -> Course, where Enrollment has student_id and course_id as a composite key, plus its own attributes like enrollment_date and grade. Skipping the junction entity is a modeling error that leads to repeating groups or comma-separated values in a single column.

Interview note

Draw relationships with cardinality notation (1:M, M:N) on the whiteboard. If the interviewer gives you a scenario like 'a doctor can see many patients, and a patient can see many doctors,' they are testing whether you create the junction entity (Appointment) with its own attributes (appointment_date, diagnosis, duration_minutes). The junction entity often has the most interesting attributes in the model.

Constraints

Constraints enforce data integrity rules in the logical model. Primary key constraints ensure every entity instance is uniquely identifiable. Foreign key constraints ensure relationships reference valid entities. Uniqueness constraints prevent duplicates (email must be unique per customer). NOT NULL constraints identify required attributes. Check constraints enforce value ranges (order_total >= 0).

In a logical model, constraints are expressed in business terms, not SQL syntax. 'Every order must reference exactly one customer' (foreign key + NOT NULL). 'No two products can share the same SKU' (unique constraint). 'Account balance cannot be negative' (check constraint). These rules document business requirements that the physical model must enforce. Missing constraints in the logical model almost always become data quality bugs in production.

Interview note

Mentioning constraints unprompted shows the interviewer you think about data quality at the design stage, not as an afterthought. When listing attributes, note which ones are required (NOT NULL) and which have uniqueness requirements. This takes 10 seconds and signals experience with real production systems where missing constraints cause silent data corruption.

How to Build a Logical Data Model

Follow this five-step process in interview design exercises. It gives you a structured approach that prevents the most common mistake: jumping straight to column names without establishing entity boundaries and relationships.

1. Identify Business Processes and Stakeholders

Start with the business processes the model must support. An e-commerce platform needs to support order placement, fulfillment, returns, and customer management. A SaaS product needs to support user signups, subscription management, feature usage tracking, and billing. Each process generates entities. List the processes first, then extract the entities that participate in each one.

2. Define Entities and Their Boundaries

Name each entity as a singular noun. Decide what each entity represents at the instance level. A Customer is one person or company with one account. An Order is one purchase event. An OrderLineItem is one product within one order. Be explicit about boundaries: does a 'Product' represent a specific item (serial number level) or a product type (SKU level)? This decision affects every downstream attribute and relationship.

3. List Attributes and Assign Logical Types

For each entity, list every attribute the business needs to track. Use logical types: text, integer, decimal, date, datetime, boolean. Mark required attributes (NOT NULL) and unique attributes. Apply the single-value test to each attribute. If an attribute can have multiple values per entity instance, it needs its own entity or a junction table.

4. Define Relationships and Cardinality

Connect entities with relationships. Specify cardinality: one-to-one, one-to-many, or many-to-many. Resolve many-to-many relationships into junction entities. Specify optionality: can an Order exist without a Customer? (No. Required relationship.) Can a Customer exist without any Orders? (Yes. Optional relationship.) Draw the relationships on the diagram with cardinality notation.

5. Validate Against Business Questions

Test the logical model against real business questions. 'What is the total revenue per product category last quarter?' Can the model answer this? 'Which customers have not placed an order in 90 days?' Can the model identify them? 'What is the average time between order placement and shipment?' Does the model capture both dates? If a business question cannot be answered by the model, an entity or attribute is missing. This validation step catches gaps before the physical implementation begins.

4 Logical Data Model Interview Questions

These questions test your ability to think at the logical level before diving into platform-specific implementation.

What is a logical data model and how does it differ from a physical data model?

A logical data model defines entities, attributes, relationships, and constraints using logical data types (text, number, date) without any platform-specific implementation details. A physical data model translates the logical model into platform-specific structures: VARCHAR(255) instead of text, BIGINT instead of number, plus indexes, partitions, and storage parameters. The logical model is technology-agnostic and serves as a communication tool between business and technical teams. The physical model is tied to a specific database platform. In an interview, emphasize that the logical model captures 'what data to store and how it relates' while the physical model captures 'how to store it on a specific platform.'

Walk me through designing a logical data model for a hotel booking system.

Start with entities: Hotel, Room, Guest, Booking, Payment, RoomType. Define key attributes: Hotel (hotel_id, name, city, star_rating), Room (room_id, hotel_id FK, room_type_id FK, floor, room_number), Guest (guest_id, name, email, phone, loyalty_tier), Booking (booking_id, guest_id FK, room_id FK, check_in_date, check_out_date, status, total_amount), Payment (payment_id, booking_id FK, amount, payment_method, payment_date), RoomType (room_type_id, name, base_rate, max_occupancy). Relationships: Hotel 1:M Room. RoomType 1:M Room. Guest 1:M Booking. Room 1:M Booking. Booking 1:M Payment. Validate: can we answer 'average occupancy rate per hotel per month'? Yes, using Booking dates and Room counts. Can we answer 'revenue by room type'? Yes, using Booking total_amount joined through Room to RoomType.

When would you skip the logical model and go straight to physical?

For quick prototypes, small analytics tables, or throwaway ETL jobs, going straight to physical is fine. But for any system that will be maintained by a team, serve multiple consumers, or operate in a regulated industry, skipping the logical model is a mistake. The logical model forces you to agree on entity definitions, relationship cardinality, and data quality constraints before committing to a platform. Changing an entity boundary after 6 months of data accumulation is 10x harder than getting it right in the logical model stage. State this trade-off clearly: speed vs correctness, and the cost of change escalates over time.

How do you handle a many-to-many relationship in a logical data model?

Create a junction (associative) entity. Student and Course have a many-to-many relationship. The junction entity is Enrollment, with student_id and course_id as a composite key. The junction entity almost always has its own attributes: enrollment_date, grade, status, dropped_date. These attributes belong to the relationship itself, not to either participating entity. If the junction entity has no attributes beyond the two foreign keys, it is still correct to create it because many-to-many relationships cannot be directly represented in a relational schema without one. Name the junction entity with a business-meaningful name (Enrollment, not StudentCourse) whenever possible.

Frequently Asked Questions

What is a logical data model?+
A logical data model is a technology-independent representation of an organization's data. It defines entities (things the business tracks), attributes (properties of those things), relationships (how things connect), and constraints (rules the data must follow). It uses logical data types like text, number, and date rather than platform-specific types like VARCHAR or BIGINT. The logical model sits between the conceptual model (high-level business concepts) and the physical model (platform-specific implementation).
What is the difference between conceptual, logical, and physical data models?+
Conceptual models show entities and relationships with no attributes or data types. They communicate with business stakeholders. Logical models add all attributes, primary and foreign keys, cardinality, and constraints using logical data types. They serve as the detailed blueprint agreed upon by business and technical teams. Physical models translate the logical model into platform-specific structures: exact column types, indexes, partitions, and storage settings. Each level adds implementation detail while the business meaning stays constant.
Who creates a logical data model?+
Data architects, senior data engineers, or analytics engineers typically create logical data models. The process involves collaboration with business stakeholders (to validate entity definitions and business rules), application developers (to understand data sources), and data analysts (to ensure the model supports analytical queries). In smaller teams, a senior data engineer may handle all three modeling levels. In larger organizations, a dedicated data architect owns the logical model.
Do I need a logical data model for a data warehouse?+
For a production data warehouse that serves multiple teams, yes. The logical model forces alignment on entity definitions (what is a 'customer'?), relationship cardinality (can an order have multiple shipping addresses?), and data quality constraints (is email required?). Without this alignment, each team builds its own definitions, leading to inconsistent metrics. For a personal analytics project or a prototype, you can skip it and go straight to the physical model. The cost of skipping increases with team size and data complexity.

Design the Contract Every Pipeline Reads

Practice on business scenarios where the logical model has to survive a warehouse, an API, and an analytics pipeline at the same time. That's the senior bar.

Start Practicing