The logical model is the contract every downstream system signs against. Pipelines, warehouses, APIs, dashboards, and ML feature stores all read from it directly or inherit its shape. Get it wrong and the damage compounds for years across every team that touches the data. Get it right and the physical layer becomes a straightforward translation job.
Interviewers at the L5 and L6 level give you a business scenario and watch you draw the entity boundaries. What they're really testing: can you think about the system before you think about the table?
DE Interviews: Modeling
Rounds at L5 (Senior)
Onsite Modeling Rounds
Modeling Qs Analyzed
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
A logical data model is a detailed, technology-independent blueprint of an organization's data. It defines what data exists, how it is structured, and what rules govern it, without specifying how it will be physically stored on any particular database platform.
The logical model sits between two other levels of data modeling. Above it is the conceptual model: a high-level view showing only entity names and relationships, designed for business stakeholders who do not care about attributes or data types. Below it is the physical model: a platform-specific implementation with exact column types (VARCHAR(255), BIGINT), indexes, partitions, and storage parameters.
Think of the logical model as the interface contract for the data platform. It sits above storage and below product, and it's the layer where a transactional system, a warehouse, a streaming pipeline, and an ML feature store can all agree on what a Customer is. If you only see it as a Visio diagram, you're missing the point. It's the blueprint every other subsystem derives from. Rename an entity here and six downstream schemas change next quarter.
Focus: Business concepts and high-level relationships
Audience: Business stakeholders, project sponsors
Detail level: Entity names and relationships only. No attributes, no data types, no keys.
Example: Customer places Order. Order contains Product. Three boxes with labeled lines between them.
Rarely tested directly. But if an interviewer asks you to start from scratch, beginning with a conceptual model shows methodical thinking.
Focus: Entities, attributes, relationships, and constraints
Audience: Data architects, data engineers, analysts
Detail level: All attributes listed. Primary keys and foreign keys defined. Cardinality specified. Data types are logical (text, number, date) not platform-specific.
Example: Customer (customer_id PK, name, email, city). Order (order_id PK, customer_id FK, order_date, total_amount). One customer to many orders.
Frequently tested. Interviewers ask you to design a logical model for a business scenario. They want to see entities, attributes, keys, and cardinality.
Focus: Platform-specific implementation details
Audience: Database administrators, platform engineers
Detail level: Column data types are platform-specific (VARCHAR(255), BIGINT, TIMESTAMP WITH TIME ZONE). Indexes defined. Partitioning strategy. Storage parameters.
Example: customers (customer_id BIGSERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, email VARCHAR(255) UNIQUE, city VARCHAR(100)). Partition orders by order_date RANGE.
Sometimes tested in senior roles. The interviewer may ask you to take a logical model and make implementation decisions for a specific platform.
Four building blocks make up every logical data model. Interviewers expect you to address all four when designing a model on a whiteboard.
An entity represents a distinct business concept that the organization needs to track. Customer, Order, Product, Employee, Account, Transaction. Each entity becomes a table in the physical model. The entity name should be a singular noun that business stakeholders recognize immediately. If stakeholders do not use the term in daily conversation, it is probably not a real entity.
Choosing the right entities is the first and most consequential decision in logical data modeling. Too few entities and you end up with wide tables that mix unrelated concepts (a single 'Person' table for customers, employees, and vendors). Too many entities and you fragment the model into dozens of tiny tables that require excessive joins. A mid-size e-commerce logical model typically has 15 to 25 entities. A financial services model might have 40 to 60.
When an interviewer asks you to design a logical model, start by listing entities before touching attributes. Say them out loud: 'The core entities are Customer, Order, OrderLineItem, Product, and Payment.' Then pause for feedback. This is the same pattern as declaring the grain in dimensional modeling: establish the big picture before diving into details.
Attributes are the properties of an entity. A Customer entity has name, email, phone, city, state, country, signup_date. An Order entity has order_date, status, total_amount, shipping_address. Attributes in a logical model use logical data types (text, number, date, boolean) rather than platform-specific types (VARCHAR(255), DECIMAL(10,2)).
Every attribute should pass the 'single value' test: does this attribute store exactly one value per entity instance? If a customer can have multiple phone numbers, phone_number should not be an attribute on Customer. Instead, create a CustomerPhone entity (or, in a denormalized warehouse model, store the primary phone only). The single-value test prevents the most common logical modeling mistake: cramming multi-valued data into a single column.
Interviewers pay attention to attribute granularity. Storing 'full_address' as a single text column is weaker than separating it into street, city, state, postal_code, and country. The separated version enables filtering and grouping by geography. If you combine them, an analyst cannot easily answer 'how many customers are in Texas?' without string parsing.
Relationships define how entities connect. A Customer places many Orders (one-to-many). An Order contains many Products and a Product appears in many Orders (many-to-many, resolved through an OrderLineItem junction entity). Relationships have cardinality (one-to-one, one-to-many, many-to-many) and optionality (required or optional).
Many-to-many relationships always require a junction (associative) entity in a logical model. Student enrolls in Course becomes Student -> Enrollment -> Course, where Enrollment has student_id and course_id as a composite key, plus its own attributes like enrollment_date and grade. Skipping the junction entity is a modeling error that leads to repeating groups or comma-separated values in a single column.
Draw relationships with cardinality notation (1:M, M:N) on the whiteboard. If the interviewer gives you a scenario like 'a doctor can see many patients, and a patient can see many doctors,' they are testing whether you create the junction entity (Appointment) with its own attributes (appointment_date, diagnosis, duration_minutes). The junction entity often has the most interesting attributes in the model.
Constraints enforce data integrity rules in the logical model. Primary key constraints ensure every entity instance is uniquely identifiable. Foreign key constraints ensure relationships reference valid entities. Uniqueness constraints prevent duplicates (email must be unique per customer). NOT NULL constraints identify required attributes. Check constraints enforce value ranges (order_total >= 0).
In a logical model, constraints are expressed in business terms, not SQL syntax. 'Every order must reference exactly one customer' (foreign key + NOT NULL). 'No two products can share the same SKU' (unique constraint). 'Account balance cannot be negative' (check constraint). These rules document business requirements that the physical model must enforce. Missing constraints in the logical model almost always become data quality bugs in production.
Mentioning constraints unprompted shows the interviewer you think about data quality at the design stage, not as an afterthought. When listing attributes, note which ones are required (NOT NULL) and which have uniqueness requirements. This takes 10 seconds and signals experience with real production systems where missing constraints cause silent data corruption.
Follow this five-step process in interview design exercises. It gives you a structured approach that prevents the most common mistake: jumping straight to column names without establishing entity boundaries and relationships.
Start with the business processes the model must support. An e-commerce platform needs to support order placement, fulfillment, returns, and customer management. A SaaS product needs to support user signups, subscription management, feature usage tracking, and billing. Each process generates entities. List the processes first, then extract the entities that participate in each one.
Name each entity as a singular noun. Decide what each entity represents at the instance level. A Customer is one person or company with one account. An Order is one purchase event. An OrderLineItem is one product within one order. Be explicit about boundaries: does a 'Product' represent a specific item (serial number level) or a product type (SKU level)? This decision affects every downstream attribute and relationship.
For each entity, list every attribute the business needs to track. Use logical types: text, integer, decimal, date, datetime, boolean. Mark required attributes (NOT NULL) and unique attributes. Apply the single-value test to each attribute. If an attribute can have multiple values per entity instance, it needs its own entity or a junction table.
Connect entities with relationships. Specify cardinality: one-to-one, one-to-many, or many-to-many. Resolve many-to-many relationships into junction entities. Specify optionality: can an Order exist without a Customer? (No. Required relationship.) Can a Customer exist without any Orders? (Yes. Optional relationship.) Draw the relationships on the diagram with cardinality notation.
Test the logical model against real business questions. 'What is the total revenue per product category last quarter?' Can the model answer this? 'Which customers have not placed an order in 90 days?' Can the model identify them? 'What is the average time between order placement and shipment?' Does the model capture both dates? If a business question cannot be answered by the model, an entity or attribute is missing. This validation step catches gaps before the physical implementation begins.
These questions test your ability to think at the logical level before diving into platform-specific implementation.
A logical data model defines entities, attributes, relationships, and constraints using logical data types (text, number, date) without any platform-specific implementation details. A physical data model translates the logical model into platform-specific structures: VARCHAR(255) instead of text, BIGINT instead of number, plus indexes, partitions, and storage parameters. The logical model is technology-agnostic and serves as a communication tool between business and technical teams. The physical model is tied to a specific database platform. In an interview, emphasize that the logical model captures 'what data to store and how it relates' while the physical model captures 'how to store it on a specific platform.'
Start with entities: Hotel, Room, Guest, Booking, Payment, RoomType. Define key attributes: Hotel (hotel_id, name, city, star_rating), Room (room_id, hotel_id FK, room_type_id FK, floor, room_number), Guest (guest_id, name, email, phone, loyalty_tier), Booking (booking_id, guest_id FK, room_id FK, check_in_date, check_out_date, status, total_amount), Payment (payment_id, booking_id FK, amount, payment_method, payment_date), RoomType (room_type_id, name, base_rate, max_occupancy). Relationships: Hotel 1:M Room. RoomType 1:M Room. Guest 1:M Booking. Room 1:M Booking. Booking 1:M Payment. Validate: can we answer 'average occupancy rate per hotel per month'? Yes, using Booking dates and Room counts. Can we answer 'revenue by room type'? Yes, using Booking total_amount joined through Room to RoomType.
For quick prototypes, small analytics tables, or throwaway ETL jobs, going straight to physical is fine. But for any system that will be maintained by a team, serve multiple consumers, or operate in a regulated industry, skipping the logical model is a mistake. The logical model forces you to agree on entity definitions, relationship cardinality, and data quality constraints before committing to a platform. Changing an entity boundary after 6 months of data accumulation is 10x harder than getting it right in the logical model stage. State this trade-off clearly: speed vs correctness, and the cost of change escalates over time.
Create a junction (associative) entity. Student and Course have a many-to-many relationship. The junction entity is Enrollment, with student_id and course_id as a composite key. The junction entity almost always has its own attributes: enrollment_date, grade, status, dropped_date. These attributes belong to the relationship itself, not to either participating entity. If the junction entity has no attributes beyond the two foreign keys, it is still correct to create it because many-to-many relationships cannot be directly represented in a relational schema without one. Name the junction entity with a business-meaningful name (Enrollment, not StudentCourse) whenever possible.
Practice on business scenarios where the logical model has to survive a warehouse, an API, and an analytics pipeline at the same time. That's the senior bar.
Start Practicing