Introduction to ACID

ACID, which stands for AtomicityConsistencyIsolation, and Durability, represents four fundamental properties of relational databases. An understanding of these principles is crucial for any engineer working with databases, including relational systems like Postgres, MySQL, SQL Server, and Oracle, as well as NoSQL and graph databases.

Understanding the Core Concepts

Before diving into the ACID properties, it is essential to first grasp what a database transaction is. A transaction is a single unit of work that must be completed in its entirety or not at all.

The Four Pillars of ACID

The video discusses each property in detail, explaining their importance. The speaker highlights that he will discuss Atomicity and Isolation before Consistency because he believes this order is crucial for a complete understanding.

  • Atomicity: This is the first property to be discussed. It guarantees that all operations within a transaction are treated as a single, indivisible unit. The transaction either completes successfully or is entirely rolled back, leaving the database unchanged.
  • Isolation: This property addresses concurrent transactions. It ensures that multiple transactions running at the same time do not interfere with each other. Each transaction’s outcome is independent of others, preventing them from seeing intermediate, uncommitted changes made by other transactions.
  • Consistency: The video emphasizes that understanding atomicity and isolation is key to understanding consistency. Consistency ensures that a transaction takes the database from one valid state to another. It prevents data corruption and ensures that the database remains in a consistent state even after multiple concurrent transactions. The speaker repeatedly mentions that a lack of isolation can lead to unexpected and bizarre data, highlighting the importance of this property.
  • Durability: This is a critical property, often sacrificed by some NoSQL databases for performance. Durability ensures that once a transaction has been successfully committed, its changes will persist permanently, even in the event of a system failure like a crash or power loss. The data is saved to a non-volatile storage location.

What is a Transaction?

A transaction is a collection of SQL queries treated as a single unit of work. The nature of structured query language and relational databases—with data distributed across multiple tables—makes it difficult or impossible to accomplish certain logical operations with a single query. Transactions group related queries together, allowing multiple operations to be executed as one atomic unit.

Consider an account deposit operation. Transferring money from one account to another requires multiple steps: first, a SELECT query retrieves the source account’s balance to verify sufficient funds exist. Then an UPDATE query deducts the amount from the source account. Finally, another UPDATE query adds the amount to the destination account. These three queries—read, update, update—form a single logical transaction that cannot be meaningfully split.

Transaction Lifespan

A transaction begins with the BEGIN keyword, signaling to the database that subsequent queries belong to a single transaction. During the transaction’s lifespan, changes exist in an uncommitted state—they haven’t been permanently persisted to the database.

When all queries complete successfully, the COMMIT keyword finalizes the transaction, persisting all changes to disk. The commit operation raises important implementation questions: if a transaction contains 1,000 queries, does the database write each change to disk immediately as queries execute, or does it hold changes in memory and write everything at commit time? Each approach has trade-offs. Writing incrementally makes commits faster but complicates rollback. Writing at commit time makes commits slower but simplifies rollback.

The ROLLBACK keyword abandons a transaction, discarding all changes made during its lifespan. If changes were written to disk during execution, rollback requires undoing that work—a potentially expensive operation. If changes remained in memory, rollback simply flushes the uncommitted data. Different databases make different choices about when to persist changes, optimizing for different workloads.

Unexpected Transaction Endings

Transactions can end unexpectedly due to crashes. If the database crashes mid-transaction after executing 20,000 queries, it must roll back those changes upon restart. This requires the database to recognize incomplete transactions during recovery and undo their effects. Every database implements this differently, optimizing based on its designers’ priorities.

An even more challenging scenario occurs when the database crashes during a commit. If commits are fast, the window for this failure is small. If commits are slow—particularly with large transactions—the probability of crashing during commit increases. This creates uncertainty: did the transaction commit successfully before the crash, or did it fail? The database must resolve this ambiguity during recovery.

PostgreSQL optimizes for fast commits by persisting changes incrementally during transaction execution, resulting in rapid commit operations. SQL Server, by contrast, may have slower commits for large transactions. These design differences reflect different engineering priorities and affect how applications should use each database.

Transaction Types

While transactions are typically associated with data modification, read-only transactions are perfectly valid. Declaring a transaction as read-only allows the database to optimize accordingly, avoiding unnecessary locking or logging overhead.

Read-only transactions provide consistency guarantees. When generating a report, a read-only transaction ensures all reads see a consistent snapshot from the transaction’s start time. Even if concurrent transactions modify data during report generation, the read-only transaction remains isolated, returning results as they existed when the transaction began. This isolation property prevents inconsistent or contradictory data from appearing in the report.

Transaction Example

Consider a table named accounts with two columns: account_id and balance. Account 1 has a balance of 500. To transfer $100 from account 1 to account 2:

BEGIN;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns 1000
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
-- Balance now 900
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
-- Balance now 600
COMMIT;

The SELECT verifies sufficient funds exist before debiting the account. This check can be implemented in application code (as shown) or as a database constraint preventing negative balances. Once the check passes, the first UPDATE deducts 100 to account 2. The COMMIT persists both updates atomically.

If the balance check reveals insufficient funds, the transaction can be rolled back, leaving both accounts unchanged.

Implicit Transactions

Databases always operate within transactions, whether explicitly declared or not. If a query executes without an explicit BEGIN, the database implicitly starts a transaction, executes the query, and immediately commits. This ensures consistency even for single-statement operations.

Some transactions are user-defined, explicitly controlled with BEGIN, COMMIT, and ROLLBACK. Others are system-defined, created implicitly by the database for individual statements. Understanding this distinction helps explain database behavior and locking patterns.

Summary

A transaction is a collection of queries treated as a single unit of work. Transactions can modify data or be read-only. They always begin with BEGIN (explicitly or implicitly) and end with COMMIT (to persist changes) or ROLLBACK (to discard changes). Unexpected crashes during transactions require careful handling to maintain database consistency. Different databases implement transactions differently, optimizing for different performance characteristics and failure modes.

Atomicity

Atomicity is one of the four ACID properties defining database management systems—not just relational databases, but any database system including NoSQL, graph databases, and time-series databases. Understanding atomicity is fundamental to working with databases.

Atomicity means all queries in a transaction must succeed.

A transaction, as previously discussed, is a collection of queries treated as one unit of work. The term “atomicity” derives from “atom”—historically considered indivisible. A transaction is atomic: it cannot be split into parts where some queries succeed and others fail.

Transaction Failure Scenarios

If one query fails for any reason—a constraint violation (such as a balance going negative), a duplicate primary key, or invalid SQL syntax—the entire transaction must roll back. Even if 100 queries succeeded before the failure, that single failed query invalidates the entire transaction. All successful changes must be undone.

Database crashes introduce a more complex scenario. If the database crashes mid-transaction before a commit occurs, what happens? The transaction didn’t explicitly fail—the next query simply never executed. When the database restarts, it must detect this incomplete transaction and roll back any changes made during it.

This raises critical implementation questions: what does the database do during a transaction? Does it write changes to disk as queries execute? If so, rollback requires undoing persisted changes. Or does it hold changes in memory until commit? Different databases make different choices based on their performance priorities.

Some databases are optimistic: they assume transactions will commit and write changes to disk immediately. When commit actually occurs, minimal work remains—just marking the transaction as committed. This makes commits extremely fast but complicates rollback, which must undo disk-persisted changes.

Other databases keep everything in memory during the transaction. Queries execute quickly because they only touch memory, but commits become slow as all accumulated changes must be flushed to disk. Rollback, however, is trivial—just discard the in-memory state.

There is no universally correct approach. Each design involves trade-offs, and the optimal choice depends on the workload.

Atomicity Failure Example

Consider the account transfer example: account 1 has a balance of 500. To transfer $100 from account 1 to account 2:

BEGIN;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns 1000, verification passes
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
-- Account 1 now has 900
 
-- Database crashes here
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
-- This query never executes

The database crashes after the first UPDATE but before the second. When the system restarts, if the database is poorly implemented, account 1 might show a balance of 500. $100 has vanished. This is a catastrophic inconsistency—money disappeared due to a technical failure.

After restart, the first account has been debited, but the second account was never credited. This represents an inconsistent view of the data. Lack of atomicity leads directly to inconsistencies.

Database Recovery Responsibility

An atomic transaction rolls back all queries if one or more fail. In the crash scenario, the second query didn’t fail—it never executed. The database didn’t give the transaction a chance to roll back because the crash occurred mid-transaction.

Recovery is the database’s responsibility. Upon restart, the database must detect incomplete transactions—those that began but never committed—and clean them up. It must reverse the first UPDATE that debited account 1, restoring the balance to $1,000 to achieve a consistent state.

From professional experience, rollback operations on long transactions can take over an hour, particularly in SQL Server. These aren’t user-initiated rollbacks—they’re automatic cleanup operations the database performs during restart. Some databases won’t even allow users to connect until rollback completes. They must undo all “garbage”—uncommitted changes—before the database can resume normal operation.

Recent database releases have improved this by allowing rollback to proceed in the background while users access unaffected tables. However, the transaction being rolled back remains locked, consuming CPU and memory resources during cleanup.

This is why long-running transactions are generally discouraged. The longer a transaction runs and the more changes it makes, the longer rollback takes if a crash occurs.

Summary

Atomicity defines a transaction as one indivisible unit of work. In a transaction with 100 queries, all 100 must succeed. If any query fails, all successful queries must be rolled back. This applies not only to explicit query failures but also to database crashes.

Even when a crash interrupts a transaction—not an explicit failure triggered by the user—the database must roll back changes upon restart. Atomicity ensures that partial work never persists, preventing the data inconsistencies that would occur if some queries succeeded while others didn’t. This property is fundamental to maintaining database integrity across all failure scenarios.

Isolation

Isolation is the third property in the ACID system and represents a critical, often underrated concept. Many engineers don’t fully understand the ramifications of transaction isolation, yet it significantly affects database behavior under concurrent workloads.

Databases handle multiple concurrent TCP connections, each executing transactions. When multiple transactions attempt to read and write the same data simultaneously, conflicts arise. The core question isolation addresses is: can my in-flight transaction see changes made by other in-flight transactions?

Isolation deals with the potential for conflicts when multiple transactions run concurrently on the same data.

This question extends beyond concurrent transactions to completed ones. If a transaction is reading data and another transaction commits a change, should the first transaction see that newly committed data? There is no universally correct answer—it depends on application requirements.

Read Phenomena

Concurrent transactions produce various “read phenomena”—generally undesirable side effects arising from insufficient isolation. Understanding these phenomena is essential for debugging subtle concurrency bugs.

  • Dirty Reads: Occur when a transaction reads data that has been written by another transaction but not yet committed. If the other transaction then rolls back, the data read by the first transaction becomes invalid or “dirty.”
  • Non-repeatable Reads: Happen when a transaction reads the same data twice and gets a different value each time because another transaction committed a change in between the reads. This is particularly problematic for long-running transactions that need a consistent view of the data.
  • Phantom Reads: This phenomenon occurs when a transaction reads a set of rows based on a search condition, and then a subsequent query in the same transaction returns a different number of rows because another transaction inserted or deleted rows that fit the search condition. This is different from a non-repeatable read because it involves a change in the number of rows, not just a change in a single row’s value.
  • Lost Updates: Occur when two transactions read and update the same data. The second transaction overwrites the changes of the first transaction, causing the first update to be lost.

Dirty Read

A dirty read occurs when a transaction reads data written by another transaction that hasn’t yet committed. The uncommitted change might later be rolled back, or the database might crash before commit, making the read invalid.

Consider a sales table with two products. Product 1 has sold 10 units at 50), and product 2 has sold 20 units at 80):

A transaction (with two queries) begins to generate a report, reading the initial values.

Transaction 1 begins and queries:

SELECT product_id, quantity * price FROM sales;

Results: Product 1 = 80.

Transaction 2 starts concurrently and executes:

UPDATE sales SET quantity = 15 WHERE product_id = 1;

Transaction 1 then queries:

SELECT SUM(quantity * price) FROM sales;

If dirty reads are allowed, Transaction 1 sees the uncommitted update from Transaction 2. The sum becomes 5 + 130 but a sum of $155, the inconsistency is obvious.

Here’s a graphical visualization:

Worse, Transaction 2 might roll back. The dirty read becomes invalid, yet Transaction 1 already used it. Here’s a graphical visualization:

Non-repeatable Read

A non-repeatable read occurs when a transaction reads the same data twice and gets different results because another transaction modified and committed the data between reads.

Using the same sales table, Transaction 1 begins:

SELECT product_id, quantity * price FROM sales;

Results: Product 1 = 80.

Transaction 2 begins, updates, and commits:

UPDATE sales SET quantity = 15 WHERE product_id = 1;
COMMIT;

Transaction 1 then executes:

SELECT SUM(quantity * price) FROM sales;

This is not a dirty read—the update is committed. However, Transaction 1 sees 130. The value changed within the same transaction, making it non-repeatable.

Here’s a graphical visualization:

Preventing non-repeatable reads requires maintaining row versions. PostgreSQL handles this by creating new row versions on update, never modifying the original. MySQL and Oracle use undo logs storing previous values. When Transaction 1 needs the original value, PostgreSQL reads the old version directly, while MySQL reconstructs it from the undo log—potentially expensive for long-running transactions.

Phantom Read

Phantom reads result from range queries where new rows appear between reads. Unlike non-repeatable reads where existing rows change, phantom reads involve rows that didn’t exist during the first query.

Transaction 1 begins:

SELECT product_id, quantity * price FROM sales;

Results: Product 1 = 80.

Transaction 2 inserts a new row and commits:

INSERT INTO sales VALUES (3, 10, 1);
COMMIT;

Transaction 1 executes:

SELECT SUM(quantity * price) FROM sales;

The result is $140, including the newly inserted product 3. Transaction 1 never read product 3 initially—it’s a phantom that appeared mid-transaction.

Phantom reads differ from non-repeatable reads in implementation. Non-repeatable reads can be prevented by locking rows that were read. But how do you lock rows that don’t exist yet? You can’t lock something you haven’t seen—hence “phantom”.

Here’s a graphical visualization:

Lost Update

A lost update occurs when two transactions read the same value, modify it, and write it back, with one overwriting the other’s changes.

Both Transaction 1 and Transaction 2 begin simultaneously and read quantity = 10 for product 1.

Transaction 1 updates:

UPDATE sales SET quantity = quantity + 10 WHERE product_id = 1;

Transaction 1 calculates 10 + 10 = 20.

Transaction 2 updates:

UPDATE sales SET quantity = quantity + 5 WHERE product_id = 1;

Transaction 2 calculates 10 + 5 = 15.

If Transaction 2 commits first, it sets quantity to 15. When Transaction 1 commits, it overwrites with 20, losing Transaction 2’s update of 5 units. The final value should be 25 (10 + 10 + 5), but it’s 20.

Row-level locking solves this. When Transaction 1 reads and updates the row, it locks it until commit. Transaction 2 must wait for the lock to be released, then reads the updated value (20) and adds 5, resulting in the correct total of 25.

Here’s a graphical visualization:

Isolation Levels

Isolation levels were invented to control which read phenomena are permissible, trading consistency for performance.

Read Uncommitted

Read Uncommitted provides no isolation—transactions see all changes, committed or not. This allows dirty reads. Few databases support this level except SQL Server in specific scenarios. It’s theoretically fast because no isolation mechanisms are needed, but practically, its usefulness is limited. Dirty reads are the worst read phenomenon, making this level rarely appropriate.

Read Committed

Read Committed is the most popular isolation level and the default for many databases. Each query sees only changes committed by other transactions. If another transaction commits while the current transaction is running, subsequent queries see the committed changes.

This prevents dirty reads but allows non-repeatable reads. A long-running transaction will pick up any commits that occur during its execution, which may or may not be desirable depending on the application.

Repeatable Read

Repeatable Read ensures that once a transaction reads a row, that row remains unchanged for the transaction’s lifetime. Reading the same row 1,000 times yields the same value. This prevents non-repeatable reads and lost updates.

However, it doesn’t prevent phantom reads. The database can only version or lock rows it has actually read. New rows inserted by other transactions can still appear in subsequent range queries.

Implementation varies by database. PostgreSQL implements Repeatable Read as snapshot isolation, versioning all data and preventing even phantom reads. Other databases use row-level locking, which prevents modifications to read rows but can’t prevent new inserts.

Snapshot Isolation

Snapshot Isolation gives each transaction a consistent view of the database as it existed at the transaction’s start. Any query sees only changes committed before the transaction began, eliminating all read phenomena including phantoms.

PostgreSQL’s Repeatable Read is actually snapshot isolation—everything is versioned, and the database filters results based on the transaction’s snapshot timestamp. This prevents even inserted rows from appearing if they weren’t committed when the transaction started.

Serializable

Serializable is the strongest isolation level, ensuring transactions execute as if they ran sequentially, one after another. There is no concurrency—transactions are effectively serialized. This eliminates all read phenomena but can severely impact performance.

Most databases implement Serializable using optimistic concurrency control rather than true serialization. Transactions execute concurrently, but if conflicts arise, one is rolled back with a serialization error, and the application must retry.

Isolation Level Comparison

Implementation Approaches

Pessimistic Concurrency Control

Pessimistic Concurrency Control (PCC) assumes conflicts are common, locking data upon read to guarantee consistency for critical, high-contention transactions. When a transaction modifies data, it acquires locks—row-level, page-level, or table-level. Other transactions attempting to access locked data must wait.

Row-level locks are most granular and allow maximum concurrency but are expensive to manage. If 700,000 rows are locked, the database must track all of them in memory. Table-level locks are simpler but block all concurrent access, potentially causing performance problems. Lock escalation can occur when too many row locks are held, automatically converting them to a table lock and blocking other transactions unnecessarily.

Optimistic Concurrency Control

Optimistic Concurrency Control (OCC) assumes conflicts are rare, allowing simultaneous data access without locking, and validates changes only at commit, making it ideal for high-read, low-contention scenarios. When conflicts arise—such as two transactions modifying the same data—the database detects the conflict at commit time and fails one transaction with a serialization error. The application must retry.

This avoids lock management overhead and prevents transactions from waiting on each other. NoSQL databases often prefer optimistic concurrency control because locking is expensive in distributed systems.

Database-Specific Behavior

Each database implements isolation levels differently. PostgreSQL implements Repeatable Read as snapshot isolation, preventing phantom reads. Other databases implementing Repeatable Read with locks still permit phantom reads. Serializable is often implemented with optimistic concurrency control rather than true serialization to maintain reasonable performance.

Understanding these differences is critical when choosing a database or diagnosing concurrency issues.

Summary

Isolation controls whether transactions see changes made by concurrent transactions. Read phenomena—dirty reads, non-repeatable reads, phantom reads, and lost updates—represent undesirable side effects of insufficient isolation. Isolation levels provide varying degrees of protection against these phenomena, with stronger isolation trading performance for consistency. Most databases default to Read Committed, which prevents dirty reads but allows non-repeatable reads and phantoms. Applications requiring stronger guarantees must explicitly configure higher isolation levels and accept the performance cost.

Consistency

Consistency represents one of the ACID properties that has been significantly debated across different database platforms. NoSQL databases, relational databases, and graph databases have made different trade-offs regarding consistency, with some sacrificing it for speed, performance, and scalability.

Consistency operates on two distinct levels: consistency in data and consistency in reads. These represent fundamentally different concerns with different causes and solutions.

Consistency in Data

Consistency in data refers to whether the persisted state matches the defined data model and integrity rules.

This is defined by the user—typically the database administrator or whoever designs the schema. It primarily involves enforcing referential integrity and foreign keys.

Even NoSQL databases, which often lack formal foreign key constraints, have referential integrity concerns. A document referencing another document creates a relationship that must maintain integrity. If the referenced document is deleted while references remain, the data becomes inconsistent.

Inconsistencies in data can arise from several sources:

  • Atomicity directly impacts data consistency. Recall the account transfer example: if a crash occurs after debiting one account but before crediting the other, $100 disappears. This represents corrupt, inconsistent data—the database has persisted an invalid state. Without atomicity, data consistency cannot be guaranteed.
  • Isolation also affects data consistency. When one transaction reads data and another concurrent transaction modifies it, the first transaction may see an inconsistent view depending on the isolation level. The data itself may be correct, but the read produces inconsistent results due to concurrent changes.

Example: Social Media Likes

Consider an Instagram-like data model with two tables. The pictures table contains:

  • id: Picture identifier
  • blob: Binary image data
  • likes: Number of likes

The likes_tracking table records individual likes:

  • user_id: Who liked the picture
  • picture_id: Which picture was liked

Referential integrity requires that the likes count in the pictures table equals the number of rows in likes_tracking for that picture. If picture 1 shows 5 likes but only 2 rows exist in likes_tracking, the data is inconsistent.

Another inconsistency: if likes_tracking contains a row where picture_id = 4, but picture 4 doesn’t exist in the pictures table, that’s an orphaned reference. Perhaps picture 4 was deleted without cascading the deletion to remove associated likes, leaving the data in an invalid state. Here’s a graphical visualization:

These constraints can be enforced by the database through foreign keys and triggers, or at the application level. Regardless of where enforcement occurs, if these invariants are violated, the data is inconsistent.

For social media likes, the consequences may be minor—few users scrutinize whether a picture with 1.8 million likes has exactly that count. But for financial transactions, medical records, or inventory systems, data consistency is critical.

Consistency in Reads

Consistency in reads addresses whether transactions see the most recent committed changes.

After a transaction commits a change, should the next transaction immediately see that change? This affects the system as a whole, particularly in distributed architectures with multiple database instances, shards, or replicas.

Consider a database system with a reverse proxy distributing requests across multiple backend instances. A transaction updates value X and commits. If the next read doesn’t return the updated value X, the system provides an inconsistent read.

This may seem obviously wrong, but it’s common in replicated systems. When writes go to a primary database that asynchronously replicates to read replicas, a read hitting a replica before replication completes returns stale data. The data isn’t corrupt—the old value was valid at some point—but from the user’s perspective, they just wrote X and now they’re reading something else.

Both relational and NoSQL databases suffer from read inconsistency in replicated configurations. This is independent of data consistency—the persisted state may be perfectly valid on each node, but different nodes are at different points in time.

Eventual Consistency

Eventual consistency is a term used to describe systems where reads may return stale data temporarily, but will eventually reflect committed changes once replication completes. This is a deliberate trade-off: accepting temporary inconsistency to gain better performance, availability, and scalability.

The term “eventual consistency” is specifically relevant to read consistency, not data consistency. If referential integrity is broken—if the data model itself is violated—there is no “eventual” healing without explicit repair mechanisms. Corrupt data stays corrupt unless a background job fixes it. Five likes in the pictures table with only two rows in likes_tracking won’t magically become consistent over time.

Eventual consistency only applies to replication lag. If you continue reading from replicas, you’ll eventually get the updated value once replication catches up. The duration of “eventual” depends on replication speed—milliseconds for fast networks and light loads, potentially seconds or more under heavy load or network partitions.

Strong consistency can be enforced through synchronous replication, where writes don’t acknowledge until all replicas confirm receipt. This eliminates read inconsistency but increases write latency. Weak consistency allows longer replication delays, improving performance at the cost of potentially stale reads.

NoSQL and Consistency Trade-offs

NoSQL databases often sacrificed traditional consistency guarantees for horizontal scalability and availability. This reflects the CAP theorem: in the presence of network partitions, a distributed system must choose between consistency and availability.

However, NoSQL databases still have data consistency concerns. Even without enforced foreign keys, applications must maintain their own referential integrity. If a MongoDB document references another document by ID, deleting the referenced document without updating or deleting the referring document creates the same inconsistency as a broken foreign key in a relational database.

The difference is where enforcement occurs: relational databases provide built-in mechanisms, while NoSQL databases leave it to application code. Both can achieve consistency; the burden simply shifts.

Summary

Consistency operates on two levels. Consistency in data ensures that persisted state respects defined integrity rules—foreign keys, constraints, and application-level invariants. This is enforced through atomicity (preventing partial updates) and proper constraint mechanisms. When data consistency is violated, the database contains corrupt information that doesn’t self-heal without intervention.

Consistency in reads ensures that transactions see committed changes promptly. In single-node systems, this is trivial. In replicated systems, replication lag can cause reads to return stale data. Eventual consistency describes systems that tolerate temporary staleness, eventually converging to the correct state as replication completes.

Both relational and NoSQL databases face both types of consistency challenges. The key difference lies in where responsibility for enforcement resides: databases with built-in constraints handle data consistency automatically, while others delegate it to applications. Read consistency depends on replication strategy—synchronous replication provides strong consistency at a performance cost, while asynchronous replication offers eventual consistency with better performance.

Durability

Durability is the property that ensures committed transactions persist permanently, surviving system crashes, power failures, and other catastrophic events.

In simple terms: once a transaction commits, the changes must be stored in non-volatile storage such that they remain available even after the database restarts. If a client commits a transaction and immediately loses power, those changes must still be present when the system recovers. If the database server crashes milliseconds after acknowledging a commit, the data must survive. Durability guarantees that committed work is never lost.

This guarantee might seem obvious—of course databases should save data permanently—but achieving true durability is expensive, and many database systems deliberately compromise it for performance.

The Cost of Durability

Durability requires writing data to persistent storage, and disk I/O is inherently slow compared to memory operations. Many databases, particularly those optimized for speed, write changes to memory initially and periodically flush snapshots to disk in the background. This approach provides dramatically faster write performance at the cost of potential data loss if the system crashes before the next snapshot completes.

Redis, for example, offers configuration options that trade durability for speed. Users can choose strong durability (every write is immediately persisted), eventual durability (writes are persisted asynchronously after a delay), or no durability at all (data exists only in memory). For use cases like caching, session storage, or IoT telemetry where occasional data loss is acceptable, weakening durability can be a reasonable trade-off. However, traditional relational databases emphasize durability as a fundamental requirement—once a transaction commits, the data must survive any subsequent failure.

Durability Techniques

Databases employ several strategies to achieve durability efficiently.

Write-Ahead Log (WAL)

The Write-Ahead Log is the most common durability mechanism. Database tables, indexes, and internal data structures can be enormous—modifying them directly on disk during every transaction would be prohibitively slow. Instead, databases maintain a separate log file that records only the changes made by each transaction.

When a transaction commits, the database writes a compact log entry describing what changed—perhaps just a few bytes indicating “row 1234 in table users, column name changed from ‘Alice’ to ‘Bob’.” This log entry is immediately flushed to disk. The actual data pages containing the table and indexes are updated later, either asynchronously or during a background checkpoint operation.

If the system crashes before those data pages are written, the database can reconstruct the correct state by replaying the WAL during recovery. The log contains a complete history of committed changes, allowing the database to bring tables and indexes back to a consistent state. This approach provides durability without requiring synchronous writes to complex data structures.

The WAL is append-only, making writes sequential rather than random. Sequential disk I/O is significantly faster than random I/O, as it avoids the mechanical seek times of hard drives and leverages more efficient write patterns even on SSDs.

Asynchronous and Synchronous Snapshots

Some databases, including Redis, use snapshots instead of or in addition to WALs. The database periodically writes the entire in-memory dataset to disk as a snapshot. Between snapshots, changes accumulate in memory.

Asynchronous snapshots provide weak durability—if the system crashes between snapshots, all changes since the last snapshot are lost. The snapshot interval determines the maximum data loss window. A snapshot every 60 seconds means up to 60 seconds of recent commits might be lost in a crash.

Synchronous snapshots force a snapshot after every transaction (or batch of transactions), providing strong durability at the cost of performance. This is rare because it negates the performance benefits of keeping data in memory.

Append-Only Files

Append-only files, similar to WALs, record each operation as it occurs. Redis’s AOF (Append-Only File) mode writes every write command to a log file. On recovery, the database replays these commands to reconstruct the dataset.

Append-only files can grow large over time, requiring periodic compaction or rewriting to remove redundant operations. For example, if a key is set to value A, then B, then C, only the final SET key C command needs to be preserved—the earlier operations are superfluous.

The Operating System Cache Problem

A critical challenge in achieving durability is the operating system’s filesystem cache. When an application asks the OS to write data to disk, the OS often doesn’t immediately write to physical storage. Instead, it writes to an in-memory cache and returns success to the application. The OS later flushes this cache to disk, batching multiple writes together to improve performance.

This caching is beneficial for most applications, reducing write latency and I/O operations. However, it undermines database durability. Consider this sequence:

  1. A transaction commits.
  2. The database writes the WAL entry to disk (or so it believes).
  3. The OS acknowledges the write but has only written to its cache, not physical storage.
  4. The database informs the client that the transaction committed successfully.
  5. The system crashes before the OS flushes its cache.
  6. On restart, the WAL entry is gone—it was only in volatile RAM.

The database lied. It told the client the transaction was durable when it wasn’t. The data was lost despite the commit acknowledgment.

The fsync System Call

To guarantee durability, databases use the fsync system call (or equivalents like fdatasync on Linux or FlushFileBuffers on Windows). fsync forces the OS to immediately write cached data to physical storage, bypassing the cache. When the call returns, the data is genuinely on disk, surviving power failures.

However, fsync is expensive. It negates the performance benefits of the OS cache and introduces significant latency. Each fsync call may wait for mechanical disk seeks or SSD write operations to complete. On traditional hard drives, this can take milliseconds—an eternity for high-throughput systems processing thousands of transactions per second.

Database engines carefully optimize when and how they call fsync. Common strategies include:

  • Group Commit: Batch multiple transactions together and fsync once for the entire batch. This amortizes the fsync cost across many transactions.
  • Background Writer Processes: Use dedicated processes to flush dirty pages to disk independently of transaction commits, reducing the amount of data requiring fsync at commit time.
  • WAL Compression: Keep the WAL small so fsync operations complete faster.

The Durability vs. Performance Trade-Off

Durability is fundamentally at odds with performance. Writing to volatile memory is orders of magnitude faster than writing to persistent storage. Databases that prioritize throughput often relax durability guarantees:

  • Redis with AOF Disabled: Runs entirely in memory with no durability. Crashes lose all data. Suitable for caching or temporary data.
  • Redis with AOF “everysec”: Flushes the AOF every second. Up to one second of commits can be lost. Balances performance and safety.
  • Redis with AOF “always”: Calls fsync after every write. Provides strong durability but significantly reduces throughput.
  • MySQL with innodb_flush_log_at_trx_commit=0: Flushes the WAL asynchronously, losing up to one second of commits in a crash.
  • PostgreSQL with synchronous_commit=off: Returns commit acknowledgments before fsync, trading durability for lower latency.

For critical data—financial transactions, medical records, authoritative state—durability cannot be compromised. A bank cannot accept losing deposits because a server crashed. An e-commerce system cannot lose completed orders. In these domains, the performance cost of fsync is unavoidable.

For less critical data—analytics logs, social media posts, telemetry—eventual durability or even no durability may suffice. The appropriate choice depends on the consequences of data loss.

Murphy’s Law and Failure Frequency

Some engineers argue that crashes are rare enough that eventual durability is acceptable. In stable production environments with redundant power supplies, ECC memory, and robust hardware, catastrophic failures are indeed infrequent. However, Murphy’s Law states that anything that can go wrong eventually will.

A datacenter power failure, a kernel panic, a runaway process exhausting memory, a network partition triggering a split-brain scenario—low-probability events accumulate over time and across many systems. At scale, rare failures become routine. A one-in-a-million event occurs monthly in a system processing millions of operations per day.

Durability guarantees protect against these inevitable failures. Even if a crash happens once per year, that single crash cannot erase months of committed transactions. The cost of restoring lost data—customer complaints, lost revenue, reputational damage, legal liability—far exceeds the performance penalty of proper durability mechanisms.

Summary

Durability ensures that committed transactions survive system failures, persisting changes to non-volatile storage. Achieving true durability requires careful coordination between the database and the operating system, bypassing caches with fsync to guarantee data reaches physical storage.

The Write-Ahead Log is the dominant durability technique, recording changes in a compact, sequential format that can be efficiently written to disk and replayed during recovery. The OS cache poses a significant challenge, requiring explicit fsync calls that introduce latency but ensure genuine persistence.

Many databases offer configuration options that trade durability for performance, allowing eventual durability or even no durability for use cases where speed matters more than safety. However, for systems managing critical data, strong durability is non-negotiable—the guarantee that “once committed, never lost” is fundamental to trust in the database.

Understanding durability helps explain why some database operations are slower than others, why group commit optimizations matter, and why infrastructure choices—SSD vs. HDD, local vs. network storage, journaling filesystems—affect database performance. Durability is where the abstractions of in-memory data structures meet the physical reality of persistent storage, and managing this boundary is one of the database engineer’s most important responsibilities.

ACID by Practical Examples

The ACID properties—Atomicity, Consistency, Isolation, and Durability—are best understood through hands-on demonstration. Using PostgreSQL in a Docker container provides a clean, reproducible environment for exploring these concepts.

Environment Setup

Spin up a PostgreSQL container:

docker run --name pg_acid -e POSTGRES_PASSWORD=postgres -d postgres:13

Access the PostgreSQL shell:

docker exec -it pg_acid psql -U postgres

Create test tables:

CREATE TABLE products (
    pid SERIAL PRIMARY KEY,
    name TEXT,
    price FLOAT,
    inventory INTEGER
);
 
CREATE TABLE sales (
    sale_id SERIAL PRIMARY KEY,
    pid INTEGER,
    price FLOAT,
    quantity INTEGER
);

Insert initial data:

INSERT INTO products (name, price, inventory) VALUES ('Phone', 999.99, 100);

Atomicity Demonstration

Atomicity ensures that all queries in a transaction succeed together or fail together. Consider selling 10 phones—this requires two operations: decrementing inventory and recording the sale.

Begin a transaction and update inventory:

BEGIN;
UPDATE products SET inventory = inventory - 10 WHERE pid = 1;

Query the products table:

SELECT * FROM products;

The inventory shows 90 units. However, if the database crashes before the sale is recorded, the inventory decrease persists without a corresponding sale entry—losing 10 units in thin air.

Simulate a crash by exiting PostgreSQL:

\q

Restart and check the products table:

docker exec -it pg_acid psql -U postgres
SELECT * FROM products;

The inventory shows 100, not 90. When the transaction crashed before commit, PostgreSQL rolled back the update. This is atomicity in action—the partial transaction was undone automatically.

Now complete the transaction correctly:

BEGIN;
UPDATE products SET inventory = inventory - 10 WHERE pid = 1;
INSERT INTO sales (pid, price, quantity) VALUES (1, 999.99, 10);

If you query the two tables products and sales in this on-going transaction, you’ll see that both UPDATE and INSERT took effect.

Note that we haven’t committed the previous transaction yet. Indeed open a new terminal and, after accessing the PosgreSQL shell again, run:

SELECT * from products;
SELECT * from sales;

Results: 100 and no rows. And that’s part of the isolation because the previous operation were inserted in the transaction, but they’re not committed or rollbacked. After committing the transaction in Terminal 1, we can run again these SELECT queries in the Terminal 2 and we finally see the correct results of 90 and 10.

Consistency Demonstration

Consistency means the database maintains defined integrity rules. In this example, total inventory should equal initial stock minus sales.

SELECT * FROM products;  -- 90 inventory remaining
SELECT * FROM sales;     -- 10 units sold

If atomicity failed, inventory would show 90 with no sale record, violating the invariant that sold units must have corresponding sale entries. Atomicity enables consistency by preventing partial updates: if you don’t have atomicity, you don’t have consistency.

Isolation Demonstration

Isolation controls whether concurrent transactions see each other’s changes. Generate a sales report while another transaction makes a sale concurrently.

In the first terminal, begin a transaction and query sales:

BEGIN transaction;
SELECT pid, COUNT(pid) FROM sales GROUP BY pid;

Results show, for example, 3 sales of product 1 and 3 sales of product 2.

In a second terminal (open another Docker exec session), insert a new sale and commit:

BEGIN transaction;
INSERT INTO sales (pid, price, quantity) VALUES (1, 999.99, 10);
UPDATE products SET inventory = inventory - 10 WHERE pid = 1;
COMMIT;

Back in the first terminal, query sales details:

SELECT pid, price, quantity FROM sales;

The new sale appears. The initial count showed 3 sales, but the detailed query shows 4. The report is inconsistent—the summary doesn’t match the details because another transaction committed changes mid-report.

Repeatable Read Isolation

Roll back and restart with stronger isolation:

ROLLBACK;
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT pid, COUNT(pid) FROM sales GROUP BY pid;

Results show 4 sales of product 1.

In the second terminal, insert another sale and commit:

BEGIN transaction;
INSERT INTO sales (pid, price, quantity) VALUES (1, 999.99, 10);
COMMIT;

Back in the first terminal:

SELECT pid, price, quantity FROM sales;

The new sale doesn’t appear. Repeatable Read provides a consistent snapshot. No matter how many times queries execute within the transaction, they see the same data—the state when the transaction began.

Query from the second terminal shows 5 sales, but the first terminal’s transaction still sees 4. After committing the first transaction and querying again, it finally sees 5. Repeatable Read prevents the transaction from seeing concurrent changes, ensuring consistent reports.

This isolation level is more expensive than Read Committed. PostgreSQL implements it using Multi-Version Concurrency Control (MVCC), maintaining row versions. MySQL and Oracle use undo logs, storing previous values that transactions can reference. Each approach has performance implications, especially for long-running transactions.

Durability Demonstration

Durability guarantees that committed changes survive crashes. Insert a new product and kill the container immediately after commit:

BEGIN;
INSERT INTO products (name, price, inventory) VALUES ('TV', 3000, 10);
COMMIT;

In another terminal, immediately stop the container:

docker stop pg_acid

The commit succeeded—PostgreSQL acknowledged it. Restart the container and check:

docker start pg_acid
docker exec -it pg_acid psql -U postgres
SELECT * FROM products;

The TV product exists. Despite killing the container milliseconds after commit, the change persisted. PostgreSQL wrote the data to disk before acknowledging the commit, guaranteeing durability.

Some databases write to memory first for performance, flushing to disk asynchronously. If the host crashes before the flush, committed data is lost. PostgreSQL’s durability guarantee means “commit” truly means “safely on disk.”

Databases like Redis offer configurable durability. In pure in-memory mode, crashes lose all data. With append-only file persistence, durability is strong but performance suffers. The choice depends on whether the data is critical (financial transactions) or ephemeral (cache entries).