In 1983, two German researchers, Theo Härder and Andreas Reuter, published "Principles of Transaction-Oriented Database Recovery" in ACM Computing Surveys. That paper coined the acronym ACID. Härder and Reuter were describing what IBM System R and its contemporaries already did internally; their contribution was giving the guarantees names that engineers could argue about. Forty-three years later, those four letters still frame every discussion about transactional correctness, and interviewers still ask you to define them because the failure modes haven't changed.
ACID Coined by Härder
DE Rounds: System Design
System Design Qs Analyzed
L6 Staff Rounds
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
Härder and Reuter structured the paper around failure modes, not abstractions. Each of the four properties is the answer to a specific disaster the database engineers of the 1970s kept running into. Crashes mid-transaction became atomicity. Interleaving writes became isolation. Power loss became durability. The acronym stuck because it maps one-to-one onto the problems that were actually breaking production.
All or nothing.
A transaction is atomic if it either completes entirely or has no effect at all. There is no partial state. If a bank transfer debits $500 from Account A and credits $500 to Account B, atomicity guarantees that both operations succeed or both are rolled back. You never end up with money deducted from A but not credited to B.
Database example:
In PostgreSQL, wrapping two UPDATE statements in a BEGIN/COMMIT block makes them atomic. If the second UPDATE fails (constraint violation, disk error, connection drop), the first UPDATE is rolled back. The database returns to the state before the transaction started.
Pipeline example:
In a data pipeline, atomicity means a batch load either writes all rows or writes none. If you INSERT 1 million rows and the job fails at row 600,000, an atomic operation rolls back all 600,000 rows. Without atomicity, you have 600,000 partial rows in the table and no way to know which ones made it without reprocessing.
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
-- If either UPDATE fails, both are rolled backValid state to valid state.
A transaction moves the database from one valid state to another valid state. All constraints (primary keys, foreign keys, CHECK constraints, triggers) must be satisfied before and after the transaction. If a constraint would be violated, the transaction is rejected entirely.
Database example:
If a foreign key constraint requires that every order references a valid customer_id, you cannot insert an order for customer 999 if that customer does not exist. The database rejects the INSERT and the table stays consistent. This applies even within a transaction: the constraint is checked when the transaction commits.
Pipeline example:
Data warehouses often relax constraint enforcement for performance. Snowflake declares foreign keys but does not enforce them. This means your pipeline must enforce consistency through data quality checks: validate that all foreign keys have matching primary keys before marking a load as complete. Consistency shifts from the database to the pipeline code.
-- Consistency enforced by constraints
ALTER TABLE orders ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES customers(id);
-- This INSERT fails if customer 999 does not exist
INSERT INTO orders (order_id, customer_id, amount)
VALUES (1001, 999, 49.99);
-- ERROR: insert or update on table "orders" violates
-- foreign key constraint "fk_customer"Transactions do not interfere with each other.
Concurrent transactions behave as if they ran sequentially. One transaction cannot see the intermediate state of another transaction. The isolation level determines exactly how strictly this rule is enforced. Higher isolation means fewer anomalies but more locking and lower throughput.
Database example:
Transaction A reads a row, then Transaction B updates the same row and commits. If Transaction A reads the row again, what does it see? Under READ COMMITTED, it sees the new value. Under REPEATABLE READ, it sees the old value. Under SERIALIZABLE, the database behaves as if one transaction ran completely before the other started.
Pipeline example:
Isolation matters when a pipeline reads from a table while another process writes to it. If your pipeline reads customer data while the application updates customer addresses, dirty reads can produce inconsistent snapshots. Using snapshot isolation or reading from a replica with a consistent point-in-time ensures the pipeline sees a stable view of the data.
-- Check current isolation level in PostgreSQL
SHOW transaction_isolation;
-- Set isolation level for a transaction
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 1;
-- No other transaction can modify this row
-- until this transaction commits or rolls back
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;Committed means committed. Permanently.
Once a transaction is committed, it survives system failures. Power outage, crash, hardware failure. The data is written to non-volatile storage before the COMMIT returns success. The database uses write-ahead logging (WAL) to guarantee this: the transaction log is flushed to disk before the data pages are updated.
Database example:
PostgreSQL writes transaction changes to the WAL before acknowledging the COMMIT. If the server crashes immediately after COMMIT returns, the WAL contains enough information to replay the transaction during recovery. The data is not lost.
Pipeline example:
For data pipelines, durability means trusting that a successful COMMIT or successful API response means the data is safe. If your pipeline writes to S3 and gets a 200 response, S3 guarantees the object is durable (replicated across multiple facilities). If your pipeline writes to a database and the COMMIT succeeds, that data survives restarts.
-- WAL ensures durability in PostgreSQL
-- After COMMIT returns, data survives crashes
BEGIN;
INSERT INTO events (event_id, payload)
VALUES (42, '{"type": "click", "page": "/pricing"}');
COMMIT;
-- At this point, the row is durable
-- Even if the server crashes 1ms later,
-- the row will be there after recoveryIsolation is the most subtle ACID property. SQL defines four isolation levels, each allowing different types of read anomalies. Higher isolation means fewer anomalies but more contention between transactions. Interviewers love asking about this trade-off.
| Level | Dirty Read | Non-Repeatable | Phantom |
|---|---|---|---|
| READ UNCOMMITTED | Yes | Yes | Yes |
| READ COMMITTED | No | Yes | Yes |
| REPEATABLE READ | No | No | Yes |
| SERIALIZABLE | No | No | No |
Interview tip: Know the default isolation level for PostgreSQL (READ COMMITTED), MySQL (REPEATABLE READ), and SQL Server (READ COMMITTED). Interviewers often ask which anomalies are possible at each level.
BASE stands for Basically Available, Soft state, Eventual consistency. It is the alternative consistency model used by NoSQL and distributed systems that prioritize availability over strict consistency. Understanding both models is essential for system design interviews.
| Property | ACID | BASE |
|---|---|---|
| Consistency model | Strong consistency. Reads always return the latest committed value. | Eventual consistency. Reads may return stale data temporarily. |
| Availability trade-off | May sacrifice availability for consistency (CP in CAP theorem). | Prioritizes availability over immediate consistency (AP in CAP theorem). |
| Typical systems | PostgreSQL, MySQL, SQL Server, Oracle, Snowflake. | DynamoDB, Cassandra, MongoDB (default config), Redis. |
| Transaction support | Full multi-statement transactions with rollback. | Limited or no multi-statement transactions. Operations are often single-row. |
| Use case | Financial systems, inventory, analytics warehouses, any system where correctness is non-negotiable. | High-throughput writes, geo-distributed systems, caching layers, systems that tolerate brief inconsistency. |
These questions test conceptual understanding and practical application. Each includes what the interviewer is evaluating and how to structure a strong answer.
What they test:
Foundational knowledge. The interviewer wants a concise definition of each property with a practical example. They also want to hear why it matters beyond OLTP: transactions in ETL, atomic batch loads, and consistency in warehouse writes.
Approach:
Define each letter in one sentence with a concrete example. Then pivot to data engineering: 'Atomicity matters for pipeline loads because a partial write is worse than no write. Consistency matters because warehouses relax constraint enforcement, so pipelines must validate data. Isolation matters when multiple pipelines write to the same table. Durability matters because committed data must survive failures.'
What they test:
Trade-off awareness. ACID provides strong consistency at the cost of availability and throughput. BASE provides high availability at the cost of temporary inconsistency. The interviewer wants to see that you understand the CAP theorem context and can recommend the right model for a given use case.
Approach:
Explain that BASE stands for Basically Available, Soft state, Eventual consistency. ACID is for systems where correctness cannot be compromised (financial transactions, inventory counts, analytics warehouses). BASE is for systems where high availability and write throughput matter more than immediate consistency (real-time event streams, caching, user activity logs). Most data engineering stacks use ACID databases for the warehouse and BASE systems for ingestion layers.
What they test:
Practical engineering. Loading a fact table and its dimension tables must be coordinated. If the dimension load fails after the fact load succeeds, fact rows reference nonexistent dimension keys. The interviewer wants to hear about transaction wrapping, staging tables, and swap patterns.
Approach:
Describe three strategies. First, wrap all loads in a single database transaction (works for small loads). Second, use a staging-then-swap pattern: load into staging tables, validate, then rename staging to production in a single atomic operation. Third, use a metadata flag: write data to the target tables but do not flip the 'latest partition' pointer until all tables are loaded successfully.
What they test:
Understanding of isolation levels in practice. The pipeline needs a consistent snapshot. READ COMMITTED might produce inconsistent reads if the application commits between the pipeline's queries. REPEATABLE READ or SNAPSHOT isolation gives a point-in-time view.
Approach:
Recommend REPEATABLE READ or snapshot isolation. In PostgreSQL, REPEATABLE READ uses MVCC to provide a consistent snapshot at the start of the transaction. In SQL Server, enable READ_COMMITTED_SNAPSHOT for similar behavior. Alternatively, read from a replica with a known replication lag. The goal is a stable snapshot that does not change while the pipeline reads.
Härder's 1983 acronym still answers the questions interviewers ask in 2026. Practice the system design rounds on DataDriven where ACID actually bites: exactly-once pipelines, batch reloads, and concurrent writer scenarios.