Concepts

OLAP vs OLTP: What Interviewers Test

Baseline knowledge for DE interviews. Understand why warehouses exist separately from application databases and how storage layout affects query performance.

Detailed Comparison

Purpose

OLAP

Analytical processing. Complex queries across large datasets: 'What was revenue by region for Q4?' Read-heavy, scans millions of rows.

OLTP

Transactional processing. Individual operations: 'Insert this order,' 'Update this address.' Many small, fast operations.

Storage Layout

OLAP

Columnar. Each column stored together. A query needing 3 of 200 columns reads only those 3. Analytical scans are fast.

OLTP

Row-based. All columns for a row stored together. Fetching a full record by ID is one disk read. Point lookups are fast.

Query Patterns

OLAP

Few queries scanning millions of rows. GROUP BY, aggregation, window functions. Seconds to minutes. Throughput-optimized.

OLTP

Millions of queries touching few rows each. INSERT, UPDATE, SELECT by key. Milliseconds. Latency-optimized.

Schema

OLAP

Denormalized (star schema, wide tables). Fewer joins, faster queries. Redundancy is acceptable.

OLTP

Normalized (3NF). Minimize redundancy, prevent update anomalies. Designed for consistent transactional access.

Examples

OLAP

Snowflake, BigQuery, Redshift, ClickHouse, DuckDB.

OLTP

PostgreSQL, MySQL, Oracle, SQL Server (application use).

Concurrency

OLAP

Low. Dozens to hundreds of concurrent queries, each resource-heavy.

OLTP

High. Thousands to millions of concurrent transactions, each lightweight.

How Columnar Storage Works

The physical difference between row and columnar storage is the single most important concept behind OLAP vs OLTP. Everything else follows from it.

How Row Storage Works

In a row-oriented database, all column values for a single row are stored together on disk. Imagine a table with columns (user_id, name, email, city, signup_date, last_login, plan_type, revenue). Row storage writes each complete row as a contiguous block. When you SELECT * FROM users WHERE user_id = 42, the database finds that one block and reads everything in a single disk operation. This is ideal for transactional access where you need the full record. But when you run SELECT city, SUM(revenue) FROM users GROUP BY city, the database must read every byte of every row, including name, email, and all the other columns you do not need. For a table with 200 columns and 100 million rows, that is a massive waste of I/O.

How Columnar Storage Works

In a columnar database, all values for a single column are stored together. All 100 million user_id values sit in one contiguous block. All 100 million city values sit in another. When you run that GROUP BY city query, the database reads only the city column and the revenue column, skipping the other 198 columns entirely. On a 200-column table, that can mean reading 1% of the data. Columnar storage also compresses dramatically better because values in the same column tend to be similar. A column of country codes (US, US, US, CA, CA, US) compresses far better than a row containing (42, 'Jane', 'jane@co.com', 'US', '2024-01-15'). Compression ratios of 10:1 are common, further reducing I/O.

Why This Matters for Interviews

Interviewers ask about columnar vs row storage to test whether you understand the physical reality beneath SQL. Knowing that Snowflake is columnar is table stakes. Explaining why a GROUP BY on a columnar system skips irrelevant columns, and why that matters for cost and speed, shows real understanding. The best answers connect storage layout to practical decisions: partitioning strategy, column pruning, and why SELECT * is expensive on wide tables in OLAP systems.

Real Systems Mapped

Knowing which real databases fall into which category is expected in interviews. Here is how the major systems classify.

PostgreSQLOLTP

Row-oriented relational database. The default choice for application backends. Excellent for transactional workloads with strong ACID guarantees.

MySQLOLTP

Row-oriented. Powers most web applications. InnoDB engine provides ACID transactions. Optimized for high-concurrency point lookups and writes.

SnowflakeOLAP

Cloud-native columnar warehouse. Separates storage and compute. Pay per query. Dominant in enterprise analytics.

BigQueryOLAP

Google's serverless columnar warehouse. No infrastructure to manage. Charges by data scanned, making column selection directly affect cost.

RedshiftOLAP

AWS columnar warehouse. Cluster-based with provisioned or serverless options. Tightly integrated with the AWS ecosystem.

ClickHouseOLAP

Open-source columnar database built for real-time analytics. Extremely fast aggregation queries. Used for event analytics and log analysis.

DuckDBOLAP

In-process columnar database. Think 'SQLite for analytics.' Runs on a laptop with no server. Reads Parquet, CSV, and JSON natively.

OracleOLTP (with OLAP features)

Enterprise row-oriented database with some columnar features (In-Memory Column Store). Primarily OLTP but can handle mixed workloads.

CockroachDBOLTP

Distributed SQL database. Row-oriented with strong consistency across regions. Built for globally distributed transactional applications.

Apache DruidOLAP

Real-time columnar analytics database. Sub-second queries on streaming and batch data. Used for user-facing analytics dashboards.

Query Pattern Examples

The same SQL runs at very different speeds depending on the underlying storage engine. These examples show why.

Query That Runs Fast on OLAP, Slow on OLTP

SELECT
  region,
  product_category,
  DATE_TRUNC('month', order_date) AS month,
  SUM(revenue) AS total_revenue,
  COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY region, product_category, DATE_TRUNC('month', order_date)
ORDER BY total_revenue DESC;

This query scans millions of rows but only needs 4 columns out of a potentially wide orders table. On a columnar OLAP system, it reads only region, product_category, order_date, revenue, and customer_id. On a row-oriented OLTP system, it reads every column of every matching row. The OLAP system also benefits from columnar compression and vectorized execution, processing batches of values in a single CPU instruction. On a 500M-row table, this might take 3 seconds on Snowflake and 45 minutes on PostgreSQL.

Query That Runs Fast on OLTP, Slow on OLAP

SELECT *
FROM orders
WHERE order_id = 'ord_8f3a2b1c';

A point lookup by primary key. On a row-oriented OLTP system with a B-tree index, this is one index lookup plus one disk read to fetch the entire row. Sub-millisecond. On a columnar OLAP system, the database must reconstruct the full row by reading from every column file and stitching the values together. Even with metadata pruning, this is far more I/O than necessary for a single row. OLAP systems are not designed for this access pattern.

Query That Exposes the Trade-Off

SELECT
  customer_id,
  order_id,
  order_date,
  status,
  total_amount
FROM orders
WHERE customer_id = 'cust_42'
ORDER BY order_date DESC
LIMIT 20;

Fetching recent orders for a single customer. On OLTP, this hits an index on customer_id and returns 20 rows instantly. On OLAP, this might still be fast if the system supports efficient filtering, but it is doing more work than necessary because columnar storage is optimized for scanning many rows, not fetching a few. In an interview, this is a good example to show you understand that the 'right' database depends on the access pattern.

Window Function on Large Dataset

SELECT
  customer_id,
  order_date,
  revenue,
  SUM(revenue) OVER (
    PARTITION BY customer_id
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_revenue
FROM orders
WHERE order_date >= '2024-01-01';

Running totals per customer across a year of data. On OLAP, the engine reads only three columns (customer_id, order_date, revenue), sorts within partitions using columnar execution, and computes the window in a single pass. On OLTP, the same query reads all columns, sorts in memory or on disk, and processes row by row. Window functions over millions of rows are a core OLAP use case and rarely belong on a transactional database.

How Interviewers Test This

Interviewers rarely ask "define OLAP." They create scenarios that require you to apply the concept. Here are the patterns they use.

They describe a workload and ask which system to use

The interviewer says: 'We have an application with 10,000 transactions per second and a reporting dashboard that runs hourly aggregations. How would you architect the data layer?' They want you to separate OLTP (application database) from OLAP (warehouse). The application writes to PostgreSQL. A pipeline replicates data to Snowflake. Dashboards query Snowflake. This is the standard pattern and they want to hear you articulate it clearly.

They ask why you cannot just use PostgreSQL for everything

This tests whether you understand the physical storage difference. The answer is that PostgreSQL is row-oriented and scans all columns even when you only need a few. For small datasets (under 50M rows) it works fine for analytics. Beyond that, the I/O cost makes analytical queries impractical. Knowing the inflection point shows practical experience.

They ask about denormalization trade-offs

OLTP schemas are normalized to prevent update anomalies. OLAP schemas are denormalized for query speed. The interviewer wants you to explain why denormalization is acceptable in a warehouse (data is loaded in bulk, not updated row-by-row) and dangerous in a transactional system (concurrent updates to redundant data cause inconsistencies).

They ask you to optimize a slow query

Sometimes the answer is not 'add an index' but 'this query belongs on a different system.' A GROUP BY scanning 2 billion rows on an OLTP database is a design problem, not a tuning problem. Recognizing when to move a workload from OLTP to OLAP is a signal of engineering maturity.

How Indexing Differs Between OLAP and OLTP

OLTP and OLAP systems use fundamentally different indexing strategies. Understanding this shows interviewers you know what is happening beneath the query optimizer.

B-Tree Index (OLTP)

The workhorse of row-oriented databases. B-trees organize data in a balanced tree structure that allows point lookups in O(log n) time. PostgreSQL and MySQL use B-trees as their default index type. Ideal for equality checks (WHERE id = 42) and range scans (WHERE date BETWEEN x AND y). Not useful for full-column aggregations because the index does not store the column values in a scannable format.

Zone Maps / Min-Max Indexes (OLAP)

Columnar databases store min and max values for each data block (micro-partition in Snowflake, row group in Parquet). When a query filters WHERE order_date > '2025-01-01', the engine checks each block's max value. If max < '2025-01-01', the entire block is skipped. No explicit index creation needed. This works automatically on sorted or clustered data. Poor clustering (random insertion order) degrades zone map effectiveness.

Bitmap Index (OLAP)

Used for low-cardinality columns (status, country, gender). Creates a bit vector for each distinct value. Extremely fast for combining multiple filters with AND/OR. ClickHouse and Oracle use bitmap indexes. Not suitable for high-cardinality columns (user_id) because the number of bit vectors becomes impractical.

Inverted Index (OLAP / Search)

Maps values back to the rows containing them. Druid and Elasticsearch use inverted indexes for fast filtering on string columns. Useful for interactive dashboards where users filter by category, region, or tag. Different from B-trees because they are optimized for filtering, not sorting.

Interview Questions with Guidance

What is the difference between OLAP and OLTP?

OLAP: analytical queries aggregating large datasets. OLTP: transactional operations on individual records. Key differences: storage layout (columnar vs row), query patterns (scans vs lookups), schema (denormalized vs normalized), concurrency model.

Why are columnar databases faster for analytics?

Columnar storage reads only needed columns, skipping the rest. For aggregating one column from a 200-column table, columnar reads 1/200th of the data. Columnar also compresses better because similar values are stored together.

Can PostgreSQL work as an OLAP database?

PostgreSQL is row-oriented and not optimized for analytics. But for datasets under 100M rows, it works adequately with proper indexing. For large-scale analytics, use a dedicated OLAP system. Knowing this boundary shows practical judgment.

When would you denormalize an OLTP database?

When a critical query path requires multiple joins creating unacceptable latency. Add a denormalized summary table updated asynchronously. Trade-off: faster reads, more complex writes, potential inconsistency.

How does a data warehouse relate to OLAP?

A data warehouse IS an OLAP system. Columnar storage, denormalized schemas, designed for analytical queries. It receives data from OLTP source systems, transforms it, and serves analytics.

How would you move data from an OLTP system to an OLAP system?

Change Data Capture (CDC) for near-real-time, or scheduled batch extraction. Tools like Debezium capture row-level changes from the OLTP write-ahead log and stream them to a warehouse or lake. Batch ETL tools like Fivetran or Airbyte run on a schedule. The choice depends on latency requirements and data volume.

What is a materialized view and how does it relate to OLAP vs OLTP?

A materialized view precomputes and stores query results. In OLTP, it can speed up expensive analytical queries without moving data to a separate system. In OLAP, materialized views accelerate frequently-run dashboards. The trade-off is storage cost and refresh latency. Good to mention when an interviewer asks about performance optimization.

Explain vectorized execution in columnar databases

Columnar engines process data in batches (vectors) rather than row-by-row. A batch of 1,024 integer values can be summed in a single CPU SIMD instruction. Row-oriented engines process one row at a time with per-row function call overhead. This is why columnar databases can aggregate billions of values in seconds. Mentioning this shows you understand why columnar is fast at the CPU level, not just the I/O level.

How does partitioning differ between OLAP and OLTP systems?

OLTP partitions by key range or hash to distribute writes evenly and keep indexes small. OLAP partitions by time (date columns) to enable partition pruning on analytical queries. A query for 'last 30 days' skips all older partitions. Different goals: OLTP partitions for write throughput, OLAP partitions for read efficiency.

How Compression Works in Columnar vs Row Storage

Compression is a major reason columnar databases are faster and cheaper for analytics. Understanding these techniques helps you explain why during interviews.

Run-Length Encoding (RLE)

OLAP

Replaces consecutive repeated values with the value and a count. A column with ['US', 'US', 'US', 'US', 'CA', 'CA'] becomes [('US', 4), ('CA', 2)]. Extremely effective on sorted, low-cardinality columns. If your country column has 10 distinct values across 1 billion rows and the data is sorted by country, RLE can compress it to almost nothing.

Dictionary Encoding

OLAP

Builds a dictionary mapping each distinct value to a small integer. Instead of storing 'United States' 500 million times, the column stores the integer 3 and looks up the string when needed. Reduces storage dramatically for string columns. Parquet and ORC use dictionary encoding by default when the number of distinct values is low enough.

Delta Encoding

OLAP

Stores the difference between consecutive values instead of absolute values. Timestamps like [1000, 1001, 1002, 1003] become [1000, 1, 1, 1]. Works well on sorted numeric columns where values increase gradually. Combined with bit-packing, delta encoding can represent millions of timestamps in a few kilobytes.

Page-Level Compression (General)

Both

Row-oriented databases compress entire pages (8KB in PostgreSQL) using algorithms like LZ4 or zstd. This helps, but mixed data types on the same page limit compression ratios. A page containing integers, strings, dates, and booleans does not compress as well as a page containing only integers. This is why columnar databases achieve 5x to 10x better compression than row databases on the same data.

Common Interview Mistakes

These are the answers that make interviewers wince. Avoid them.

Saying 'OLAP is for big data and OLTP is for small data'

Size is not the distinction. An OLTP system can hold billions of rows and handle millions of transactions. The difference is the access pattern: OLAP reads many rows to produce aggregates, OLTP reads or writes a few rows at a time. Some OLTP databases are larger than some OLAP databases.

Confusing OLAP with a specific product

OLAP is a workload pattern, not a product. Snowflake is an OLAP system. BigQuery is an OLAP system. A Parquet file queried by DuckDB is OLAP. The term describes the access pattern (analytical queries), not the vendor.

Claiming columnar storage is always better

Columnar storage is better for analytical queries. It is worse for transactional workloads. Inserting a single row into a columnar table writes to every column file separately. Point lookups require reading from every column file. The right storage layout depends on the workload, and strong candidates say this explicitly.

Forgetting about hybrid workloads

In practice, most organizations need both OLAP and OLTP. The architecture question is how data flows between them: CDC, batch ETL, or streaming. Interviewers reward candidates who think about the full system, not just one side.

Frequently Asked Questions

What is OLAP?+
Online Analytical Processing. Systems optimized for complex queries scanning large datasets. Data warehouses like Snowflake, BigQuery, and Redshift are OLAP systems using columnar storage.
What is OLTP?+
Online Transaction Processing. Systems handling individual operations at high concurrency and low latency. Application databases like PostgreSQL and MySQL are OLTP systems using row storage.
Do interviews ask about OLAP vs OLTP?+
Yes. Foundational concept in data modeling and system design interviews. Interviewers want to hear you understand why different systems exist and when to use each.
What is a columnar database?+
A database storing data by column rather than by row. All values for a column are stored together, making analytical queries fast and enabling better compression. Examples: Snowflake, BigQuery, ClickHouse, DuckDB.
Can one database handle both OLAP and OLTP workloads?+
Some databases attempt this (called HTAP: Hybrid Transactional/Analytical Processing). TiDB and SingleStore are examples. In practice, most teams separate the systems because optimizing for both simultaneously creates compromises in each. For interviews, the safe answer is to separate them and explain why.
What is the relationship between OLAP and data warehouses?+
A data warehouse is an OLAP system. It uses columnar storage, denormalized schemas, and is designed for analytical queries. The terms are closely related but not identical: OLAP describes the workload pattern, and 'data warehouse' describes the system built to handle that pattern.
How does DuckDB fit into the OLAP vs OLTP picture?+
DuckDB is an in-process columnar (OLAP) database. It runs inside your application or notebook with no server. It is designed for analytical queries on datasets that fit on one machine (up to hundreds of GBs). Think of it as SQLite for analytics: embedded, zero-config, and columnar.
What is partition pruning and why does it matter for OLAP?+
Partition pruning means the query engine skips entire partitions of data that cannot contain relevant rows. If your data is partitioned by month and you query the last 7 days, the engine reads only the current month's partition. This can reduce data scanned by 90%+ and directly reduces query time and cost on usage-based systems like BigQuery.

Practice Data Engineering Concepts

DataDriven covers data modeling, SQL, and system design at interview difficulty.

Start Practicing