CAS Deep Dive — What Actually Happens at the CPU Level | Department of random technical stuffs

The Problem

So I was reading about concurrent programming and stumbled into this thing called CAS — Compare-And-Swap. And once you understand it, you start seeing it everywhere in the Java ecosystem. Let me walk through it the way it clicked for me.

The classic race condition looks like this:

Thread A reads x = 5
Thread B reads x = 5
Thread A writes x = 6
Thread B writes x = 6   ← A's update is lost

Both threads read the same value, both compute a new value, both write back. One update silently disappears. This is the read-modify-write problem and it’s the root of a huge class of concurrency bugs.

What CAS Actually Does

CAS stands for Compare-And-Swap. The idea is simple: “only update this value if it’s still what I expect it to be.”

Thread A: CAS(x, expected=5, new=6) → CPU locks the cache line → succeeds → x=6
Thread B: CAS(x, expected=5, new=6) → expected=5 but x=6 now   → FAILS
Thread B retries:      CAS(x, expected=6, new=7) → succeeds

The CPU instruction (LOCK CMPXCHG on x86) is indivisible — no other core can touch that memory location while it executes. That’s the entire guarantee. One atomic check-and-write, no locks needed.

The Retry Loop Pattern (Optimistic Locking)

CAS failures are not errors — they’re expected. The standard pattern is spin-retry:

AtomicLong counter = new AtomicLong(0);

void increment() {
    while (true) {
        long current = counter.get();       // read
        long next = current + 1;            // compute
        if (counter.compareAndSet(current, next)) {
            return;                         // won the race → done
        }
        // lost the race → retry with fresh value
    }
}

This is called optimistic locking — assume no contention, retry if wrong. Under low contention this is dramatically faster than synchronized. Under very high contention it can spin, which is why you pick the right tool for the right situation.

The ABA Problem — The Trap

CAS has one subtle bug that I found really interesting:

Thread A reads x = "A"
Thread B changes x: "A" → "B" → "A"   (back to A)
Thread A does CAS(expected="A", new="C") → SUCCEEDS
                                            ↑ but the value changed twice!
                                            Thread A has no idea

The value looks the same but it’s not the same state. The fix is to stamp the value with a version number:

// Stamp the value with a version number
AtomicStampedReference<String> ref = new AtomicStampedReference<>("A", 0);

// CAS now checks BOTH value AND stamp
ref.compareAndSet("A", "C", stamp=0, newStamp=1);
// Thread B's double-flip would have incremented stamp to 2
// So Thread A's CAS fails — correct behaviour

Most real-world code ignores ABA because they use primitives (long, int) where it doesn’t matter. It only bites you with object references.

Where CAS Lives in Java

java.util.concurrent — built on top of it

The whole package is essentially thin wrappers over Unsafe.compareAndSwapX():

AtomicInteger i = new AtomicInteger(0);
i.incrementAndGet();       // CAS loop internally
i.getAndAdd(5);            // CAS loop
i.updateAndGet(x -> x*2); // CAS loop

ConcurrentLinkedQueue — lock-free queue

The Michael-Scott queue algorithm, pure CAS. Enqueue claims a slot with CAS on the tail pointer. Dequeue advances head with CAS. No locks anywhere.

LongAdder — smarter than AtomicLong under contention

// AtomicLong under heavy contention → threads spinning on same memory location
// LongAdder → each thread updates its OWN cell, sum() adds them all up

LongAdder adder = new LongAdder();
adder.increment();  // updates thread-local cell, no contention
adder.sum();        // merges all cells

Prometheus’s Java metrics client uses this internally. That’s how it handles millions of metric increments without falling over.

Open Source Projects Built on CAS

Netty

The entire I/O pipeline is lock-free via CAS. Channel state transitions use AtomicIntegerFieldUpdater — a clever trick that does CAS on a plain int field without boxing it into AtomicInteger, saving allocation overhead on the hot path:

private static final AtomicIntegerFieldUpdater<AbstractChannel> REGISTERED_UPDATER =
    AtomicIntegerFieldUpdater.newUpdater(AbstractChannel.class, "registered");
// State: OPEN → REGISTERED → ACTIVE → INACTIVE → CLOSED

Disruptor (LMAX)

The most extreme CAS usage in the Java ecosystem. Powers financial trading systems doing 25 million messages/second:

Ring buffer with fixed size (power of 2)
Producer: CAS on sequence number to claim a slot
Consumer: spin-wait on sequence — no locks, no queues

The numbers are wild: ~25M ops/sec vs ~5M for ArrayBlockingQueue. Latency ~52ns vs ~500ns for synchronized queues. Log4j2’s async loggers run on this.

RxJava / Project Reactor

Operator fusion and subscription state managed entirely via CAS. There’s a pattern called the “wip counter” that I found fascinating:

static final AtomicIntegerFieldUpdater<FluxFlatMap> WIP =
    AtomicIntegerFieldUpdater.newUpdater(FluxFlatMap.class, "wip");

void drain() {
    if (WIP.getAndIncrement(this) == 0) {
        // I'm the only one draining — proceed
    }
    // otherwise someone else is already draining — exit
}

Only one thread drains the operator at a time, enforced entirely by CAS. No lock, no synchronized.

HikariCP

The fastest JDBC pool uses CAS for connection bag management. Connection state transitions (NOT_IN_USE → IN_USE → REMOVED) happen via CAS on the hot path. No synchronized blocks. That’s a big part of why it benchmarks faster than every other pool.

Caffeine

Uses a ring buffer per thread with CAS to track read/write operations for its eviction policy. Cache hits never contend with each other.

Pattern Summary

Pattern	What CAS Does	Example
Optimistic counter	Increment without lock	`AtomicLong.incrementAndGet()`
State machine	Transition state atomically	Netty channel states
Claim a slot	One winner among many threads	Disruptor sequence
Drain guard	Only one thread drains at a time	Reactor wip counter
Lock-free queue	Enqueue/dequeue without mutex	`ConcurrentLinkedQueue`

The Mental Model

CAS is optimistic locking — assume the world hasn’t changed, act, verify. If wrong, retry or skip.

synchronized is pessimistic locking — assume contention, lock first, act, release.

CAS wins when contention is low to medium, operations are short (no I/O inside the retry loop), and you can afford to retry or drop.

synchronized wins when contention is very high (CAS would spin endlessly) or the protected section is complex — multiple steps that must all succeed together.

Once you internalize this, reading through Netty, Reactor, or HikariCP source code starts making a lot more sense. You see the AtomicIntegerFieldUpdater and immediately know: lock-free state machine. You see getAndIncrement() == 0 and immediately know: drain guard. It’s a small vocabulary that unlocks a lot of JVM internals.

The Problem#

What CAS Actually Does#

The Retry Loop Pattern (Optimistic Locking)#

The ABA Problem — The Trap#

Where CAS Lives in Java#

java.util.concurrent — built on top of it#

ConcurrentLinkedQueue — lock-free queue#

LongAdder — smarter than AtomicLong under contention#

Open Source Projects Built on CAS#

Netty#

Disruptor (LMAX)#

RxJava / Project Reactor#

HikariCP#

Caffeine#

Pattern Summary#

The Mental Model#