Concepts

ACID Properties in Databases

In 1983, two German researchers, Theo Härder and Andreas Reuter, published "Principles of Transaction-Oriented Database Recovery" in ACM Computing Surveys. That paper coined the acronym ACID. Härder and Reuter were describing what IBM System R and its contemporaries already did internally; their contribution was giving the guarantees names that engineers could argue about. Forty-three years later, those four letters still frame every discussion about transactional correctness, and interviewers still ask you to define them because the failure modes haven't changed.

1983

ACID Coined by Härder

3%

DE Rounds: System Design

27

System Design Qs Analyzed

17%

L6 Staff Rounds

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

The Four ACID Properties

Härder and Reuter structured the paper around failure modes, not abstractions. Each of the four properties is the answer to a specific disaster the database engineers of the 1970s kept running into. Crashes mid-transaction became atomicity. Interleaving writes became isolation. Power loss became durability. The acronym stuck because it maps one-to-one onto the problems that were actually breaking production.

A

Atomicity

All or nothing.

A transaction is atomic if it either completes entirely or has no effect at all. There is no partial state. If a bank transfer debits $500 from Account A and credits $500 to Account B, atomicity guarantees that both operations succeed or both are rolled back. You never end up with money deducted from A but not credited to B.

Database example:

In PostgreSQL, wrapping two UPDATE statements in a BEGIN/COMMIT block makes them atomic. If the second UPDATE fails (constraint violation, disk error, connection drop), the first UPDATE is rolled back. The database returns to the state before the transaction started.

Pipeline example:

In a data pipeline, atomicity means a batch load either writes all rows or writes none. If you INSERT 1 million rows and the job fails at row 600,000, an atomic operation rolls back all 600,000 rows. Without atomicity, you have 600,000 partial rows in the table and no way to know which ones made it without reprocessing.

BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 1;
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
-- If either UPDATE fails, both are rolled back
C

Consistency

Valid state to valid state.

A transaction moves the database from one valid state to another valid state. All constraints (primary keys, foreign keys, CHECK constraints, triggers) must be satisfied before and after the transaction. If a constraint would be violated, the transaction is rejected entirely.

Database example:

If a foreign key constraint requires that every order references a valid customer_id, you cannot insert an order for customer 999 if that customer does not exist. The database rejects the INSERT and the table stays consistent. This applies even within a transaction: the constraint is checked when the transaction commits.

Pipeline example:

Data warehouses often relax constraint enforcement for performance. Snowflake declares foreign keys but does not enforce them. This means your pipeline must enforce consistency through data quality checks: validate that all foreign keys have matching primary keys before marking a load as complete. Consistency shifts from the database to the pipeline code.

-- Consistency enforced by constraints
ALTER TABLE orders ADD CONSTRAINT fk_customer
  FOREIGN KEY (customer_id) REFERENCES customers(id);

-- This INSERT fails if customer 999 does not exist
INSERT INTO orders (order_id, customer_id, amount)
VALUES (1001, 999, 49.99);
-- ERROR: insert or update on table "orders" violates
-- foreign key constraint "fk_customer"
I

Isolation

Transactions do not interfere with each other.

Concurrent transactions behave as if they ran sequentially. One transaction cannot see the intermediate state of another transaction. The isolation level determines exactly how strictly this rule is enforced. Higher isolation means fewer anomalies but more locking and lower throughput.

Database example:

Transaction A reads a row, then Transaction B updates the same row and commits. If Transaction A reads the row again, what does it see? Under READ COMMITTED, it sees the new value. Under REPEATABLE READ, it sees the old value. Under SERIALIZABLE, the database behaves as if one transaction ran completely before the other started.

Pipeline example:

Isolation matters when a pipeline reads from a table while another process writes to it. If your pipeline reads customer data while the application updates customer addresses, dirty reads can produce inconsistent snapshots. Using snapshot isolation or reading from a replica with a consistent point-in-time ensures the pipeline sees a stable view of the data.

-- Check current isolation level in PostgreSQL
SHOW transaction_isolation;

-- Set isolation level for a transaction
BEGIN ISOLATION LEVEL SERIALIZABLE;
  SELECT balance FROM accounts WHERE id = 1;
  -- No other transaction can modify this row
  -- until this transaction commits or rolls back
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
D

Durability

Committed means committed. Permanently.

Once a transaction is committed, it survives system failures. Power outage, crash, hardware failure. The data is written to non-volatile storage before the COMMIT returns success. The database uses write-ahead logging (WAL) to guarantee this: the transaction log is flushed to disk before the data pages are updated.

Database example:

PostgreSQL writes transaction changes to the WAL before acknowledging the COMMIT. If the server crashes immediately after COMMIT returns, the WAL contains enough information to replay the transaction during recovery. The data is not lost.

Pipeline example:

For data pipelines, durability means trusting that a successful COMMIT or successful API response means the data is safe. If your pipeline writes to S3 and gets a 200 response, S3 guarantees the object is durable (replicated across multiple facilities). If your pipeline writes to a database and the COMMIT succeeds, that data survives restarts.

-- WAL ensures durability in PostgreSQL
-- After COMMIT returns, data survives crashes

BEGIN;
  INSERT INTO events (event_id, payload)
  VALUES (42, '{"type": "click", "page": "/pricing"}');
COMMIT;
-- At this point, the row is durable
-- Even if the server crashes 1ms later,
-- the row will be there after recovery

Isolation Levels

Isolation is the most subtle ACID property. SQL defines four isolation levels, each allowing different types of read anomalies. Higher isolation means fewer anomalies but more contention between transactions. Interviewers love asking about this trade-off.

LevelDirty ReadNon-RepeatablePhantom
READ UNCOMMITTEDYesYesYes
READ COMMITTEDNoYesYes
REPEATABLE READNoNoYes
SERIALIZABLENoNoNo

Interview tip: Know the default isolation level for PostgreSQL (READ COMMITTED), MySQL (REPEATABLE READ), and SQL Server (READ COMMITTED). Interviewers often ask which anomalies are possible at each level.

ACID vs BASE

BASE stands for Basically Available, Soft state, Eventual consistency. It is the alternative consistency model used by NoSQL and distributed systems that prioritize availability over strict consistency. Understanding both models is essential for system design interviews.

PropertyACIDBASE
Consistency modelStrong consistency. Reads always return the latest committed value.Eventual consistency. Reads may return stale data temporarily.
Availability trade-offMay sacrifice availability for consistency (CP in CAP theorem).Prioritizes availability over immediate consistency (AP in CAP theorem).
Typical systemsPostgreSQL, MySQL, SQL Server, Oracle, Snowflake.DynamoDB, Cassandra, MongoDB (default config), Redis.
Transaction supportFull multi-statement transactions with rollback.Limited or no multi-statement transactions. Operations are often single-row.
Use caseFinancial systems, inventory, analytics warehouses, any system where correctness is non-negotiable.High-throughput writes, geo-distributed systems, caching layers, systems that tolerate brief inconsistency.

4 ACID Interview Questions

These questions test conceptual understanding and practical application. Each includes what the interviewer is evaluating and how to structure a strong answer.

Q1: Explain ACID properties. Why do they matter for data engineering?

What they test:

Foundational knowledge. The interviewer wants a concise definition of each property with a practical example. They also want to hear why it matters beyond OLTP: transactions in ETL, atomic batch loads, and consistency in warehouse writes.

Approach:

Define each letter in one sentence with a concrete example. Then pivot to data engineering: 'Atomicity matters for pipeline loads because a partial write is worse than no write. Consistency matters because warehouses relax constraint enforcement, so pipelines must validate data. Isolation matters when multiple pipelines write to the same table. Durability matters because committed data must survive failures.'

Q2: What is the difference between ACID and BASE? When would you choose each?

What they test:

Trade-off awareness. ACID provides strong consistency at the cost of availability and throughput. BASE provides high availability at the cost of temporary inconsistency. The interviewer wants to see that you understand the CAP theorem context and can recommend the right model for a given use case.

Approach:

Explain that BASE stands for Basically Available, Soft state, Eventual consistency. ACID is for systems where correctness cannot be compromised (financial transactions, inventory counts, analytics warehouses). BASE is for systems where high availability and write throughput matter more than immediate consistency (real-time event streams, caching, user activity logs). Most data engineering stacks use ACID databases for the warehouse and BASE systems for ingestion layers.

Q3: How do you ensure atomicity in a data pipeline that loads multiple tables?

What they test:

Practical engineering. Loading a fact table and its dimension tables must be coordinated. If the dimension load fails after the fact load succeeds, fact rows reference nonexistent dimension keys. The interviewer wants to hear about transaction wrapping, staging tables, and swap patterns.

Approach:

Describe three strategies. First, wrap all loads in a single database transaction (works for small loads). Second, use a staging-then-swap pattern: load into staging tables, validate, then rename staging to production in a single atomic operation. Third, use a metadata flag: write data to the target tables but do not flip the 'latest partition' pointer until all tables are loaded successfully.

Q4: What isolation level would you use for a pipeline that reads from OLTP while the application is writing?

What they test:

Understanding of isolation levels in practice. The pipeline needs a consistent snapshot. READ COMMITTED might produce inconsistent reads if the application commits between the pipeline's queries. REPEATABLE READ or SNAPSHOT isolation gives a point-in-time view.

Approach:

Recommend REPEATABLE READ or snapshot isolation. In PostgreSQL, REPEATABLE READ uses MVCC to provide a consistent snapshot at the start of the transaction. In SQL Server, enable READ_COMMITTED_SNAPSHOT for similar behavior. Alternatively, read from a replica with a known replication lag. The goal is a stable snapshot that does not change while the pipeline reads.

ACID Properties FAQ

What are ACID properties in a database?+
ACID stands for Atomicity, Consistency, Isolation, and Durability. These four properties guarantee that database transactions are processed reliably. Atomicity ensures all-or-nothing execution. Consistency ensures the database moves between valid states. Isolation ensures concurrent transactions do not interfere. Durability ensures committed data survives system failures. Together, they form the foundation of reliable data storage in relational databases like PostgreSQL, MySQL, and SQL Server.
Does Snowflake support ACID transactions?+
Yes. Snowflake supports ACID transactions. Each SQL statement runs within an implicit transaction, and you can use explicit BEGIN/COMMIT for multi-statement transactions. Snowflake uses multi-version concurrency control (MVCC) for isolation, meaning readers never block writers. However, Snowflake does not enforce foreign key constraints, so the Consistency property is partially the pipeline's responsibility. Atomicity, Isolation, and Durability are fully supported by the engine.
What is the difference between ACID and BASE?+
ACID prioritizes correctness and strong consistency. Every read returns the latest committed value. BASE (Basically Available, Soft state, Eventual consistency) prioritizes availability and partition tolerance. Reads may temporarily return stale data. ACID systems include PostgreSQL and Snowflake. BASE systems include DynamoDB and Cassandra. Most data engineering architectures use both: BASE systems for high-throughput ingestion and ACID systems for the analytics warehouse where correctness matters.
Why do data engineers need to understand ACID properties?+
Data engineers build pipelines that read from and write to databases. Understanding ACID properties helps you design pipelines that handle failures correctly (atomicity), validate data integrity (consistency), avoid race conditions with concurrent writes (isolation), and trust that committed data is safe (durability). Pipeline failures are inevitable. ACID understanding is what separates pipelines that recover gracefully from pipelines that produce corrupted data.

Four Letters That Outlasted the Mainframe

Härder's 1983 acronym still answers the questions interviewers ask in 2026. Practice the system design rounds on DataDriven where ACID actually bites: exactly-once pipelines, batch reloads, and concurrent writer scenarios.