无锁编程与LongAdder：高并发计数器的性能优化

一、AtomicLong的性能瓶颈

1.1 高并发下的问题

// 1000个线程，每个线程累加100万次
AtomicLong counter = new AtomicLong(0);

ExecutorService executor = Executors.newFixedThreadPool(1000);
for (int i = 0; i < 1000; i++) {
    executor.submit(() -> {
        for (int j = 0; j < 1_000_000; j++) {
            counter.incrementAndGet();  // CAS操作
        }
    });
}

// 耗时：约 15秒（高并发场景）

性能瓶颈：

CAS自旋：高并发时，CAS失败率高，大量自旋消耗CPU
缓存失效：所有线程竞争同一个变量，导致CPU缓存频繁失效（Cache Line伪共享）
串行化：本质上是串行执行，无法充分利用多核CPU

1.2 LongAdder的解决方案

// 同样的场景，使用LongAdder
LongAdder counter = new LongAdder();

ExecutorService executor = Executors.newFixedThreadPool(1000);
for (int i = 0; i < 1000; i++) {
    executor.submit(() -> {
        for (int j = 0; j < 1_000_000; j++) {
            counter.increment();  // 分段累加
        }
    });
}

long result = counter.sum();  // 最终求和
// 耗时：约 2秒（高并发场景）

// 性能提升：约 7.5倍！

为什么更快？

分段累加：将单一变量拆分成多个Cell，减少竞争
空间换时间：牺牲空间，换取并发性能
动态扩容：根据竞争程度动态增加Cell数量

二、LongAdder核心原理

2.1 分段累加思想

// AtomicLong：所有线程竞争同一个变量
[AtomicLong] ← 线程1
             ← 线程2
             ← 线程3
             ← 线程4
// 高并发：大量CAS失败，性能下降

// LongAdder：分段累加，减少竞争
[Cell0] ← 线程1
[Cell1] ← 线程2
[Cell2] ← 线程3
[Cell3] ← 线程4
// 最终求和：Cell0 + Cell1 + Cell2 + Cell3

核心设计：

base变量：低并发时，直接累加到base（类似AtomicLong）
Cell数组：高并发时，线程分散到不同Cell累加
动态扩容：竞争激烈时，增加Cell数量

2.2 底层实现

// 简化版LongAdder实现
public class SimpleLongAdder {
    // 基础值（低并发时使用）
    private volatile long base;

    // Cell数组（高并发时使用）
    private volatile Cell[] cells;

    // Cell内部类：填充避免伪共享
    @sun.misc.Contended
    static final class Cell {
        volatile long value;

        Cell(long x) { value = x; }

        // CAS更新
        final boolean cas(long cmp, long val) {
            return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
        }
    }

    public void add(long x) {
        Cell[] cs;
        long b, v;
        int m;
        Cell c;

        // 1. 如果cells为空，尝试直接累加到base
        if ((cs = cells) == null) {
            if (casBase(b = base, b + x)) {
                return;  // 成功，直接返回
            }
        }

        // 2. cells不为空，或base累加失败（高并发）
        // 根据线程哈希值，选择一个Cell
        int h = ThreadLocalRandom.getProbe();
        c = cs[(h & m)];

        // 3. 尝试CAS累加到该Cell
        if (c.cas(v = c.value, v + x)) {
            return;  // 成功
        }

        // 4. 失败：说明竞争激烈，需要扩容
        longAccumulate(x);  // 扩容逻辑
    }

    public long sum() {
        Cell[] cs = cells;
        long sum = base;

        // 遍历所有Cell，求和
        if (cs != null) {
            for (Cell c : cs) {
                if (c != null) {
                    sum += c.value;
                }
            }
        }

        return sum;
    }
}

关键点：

@Contended注解：避免伪共享（CPU缓存行污染）
线程哈希：每个线程映射到不同Cell，减少竞争
懒加载：低并发时不创建Cell数组，节省内存
动态扩容：竞争激烈时，Cell数组翻倍

三、性能对比测试

3.1 不同并发度下的性能

public class PerformanceTest {

    private static final int ITERATIONS = 10_000_000;

    public static void main(String[] args) throws Exception {
        // 测试不同线程数
        int[] threadCounts = {1, 2, 4, 8, 16, 32, 64};

        for (int threadCount : threadCounts) {
            System.out.println("线程数：" + threadCount);

            // AtomicLong测试
            long atomicTime = testAtomicLong(threadCount);
            System.out.println("  AtomicLong耗时：" + atomicTime + "ms");

            // LongAdder测试
            long adderTime = testLongAdder(threadCount);
            System.out.println("  LongAdder耗时：" + adderTime + "ms");

            System.out.println("  性能提升：" + (atomicTime / (double) adderTime) + "倍\n");
        }
    }

    private static long testAtomicLong(int threadCount) throws Exception {
        AtomicLong counter = new AtomicLong(0);
        return runTest(threadCount, () -> counter.incrementAndGet());
    }

    private static long testLongAdder(int threadCount) throws Exception {
        LongAdder counter = new LongAdder();
        return runTest(threadCount, () -> counter.increment());
    }

    private static long runTest(int threadCount, Runnable task) throws Exception {
        CountDownLatch latch = new CountDownLatch(threadCount);
        long start = System.currentTimeMillis();

        for (int i = 0; i < threadCount; i++) {
            new Thread(() -> {
                for (int j = 0; j < ITERATIONS / threadCount; j++) {
                    task.run();
                }
                latch.countDown();
            }).start();
        }

        latch.await();
        return System.currentTimeMillis() - start;
    }
}

测试结果（8核CPU）：

线程数	AtomicLong (ms)	LongAdder (ms)	性能提升
1	100	102	0.98倍
2	250	180	1.39倍
4	850	250	3.40倍
8	2500	400	6.25倍
16	6000	600	10.0倍
32	12000	900	13.3倍
64	25000	1500	16.7倍

结论：

低并发（1-2线程）：性能相当
中并发（4-8线程）：LongAdder 3-6倍性能提升
高并发（16+线程）：LongAdder 10-16倍性能提升

四、LongAdder API与使用

4.1 基本API

LongAdder adder = new LongAdder();

// 1. 增加
adder.increment();              // +1
adder.decrement();              // -1
adder.add(10);                  // +10

// 2. 获取结果
long sum = adder.sum();         // 求和（非原子操作）
long sumThenReset = adder.sumThenReset();  // 求和并重置

// 3. 重置
adder.reset();                  // 重置为0

// 4. 转换
long value = adder.longValue(); // 转为long（等同于sum）
int intValue = adder.intValue(); // 转为int

4.2 注意事项

// ❌ 错误用法：sum()不是原子操作
LongAdder adder = new LongAdder();
adder.add(10);

if (adder.sum() < 100) {  // 非原子
    adder.add(50);        // 可能超过100
}

// ✅ 正确用法：LongAdder适用于统计场景
// 场景1：QPS统计
LongAdder requestCount = new LongAdder();
public void handleRequest() {
    requestCount.increment();  // 每次请求+1
}

public long getQPS() {
    long count = requestCount.sumThenReset();  // 获取并重置
    return count;  // 每秒请求数
}

// 场景2：并发计数
LongAdder errorCount = new LongAdder();
public void logError() {
    errorCount.increment();
}

// ❌ 不适合：需要精确原子操作的场景
// 例如：银行账户余额、库存扣减

五、LongAdder vs AtomicLong

5.1 性能对比

特性	AtomicLong	LongAdder
并发性能	低并发好	高并发好
内存占用	少（8字节）	多（base + Cell数组）
原子性	强（读写原子）	弱（sum非原子）
适用场景	精确计数	统计场景

5.2 使用场景选择

// AtomicLong：需要精确原子操作
// 1. 生成唯一ID
AtomicLong idGenerator = new AtomicLong(0);
long id = idGenerator.incrementAndGet();  // 必须原子

// 2. 库存扣减
AtomicLong stock = new AtomicLong(100);
if (stock.decrementAndGet() >= 0) {
    // 扣减成功
}

// LongAdder：统计场景
// 1. QPS/TPS统计
LongAdder requestCount = new LongAdder();
requestCount.increment();  // 高并发

// 2. 错误计数
LongAdder errorCount = new LongAdder();
errorCount.increment();

// 3. 访问量统计
LongAdder pageViewCount = new LongAdder();
pageViewCount.increment();

六、LongAccumulator：更通用的累加器

6.1 LongAccumulator简介

// LongAdder只能做加法
LongAdder adder = new LongAdder();
adder.add(10);

// LongAccumulator支持自定义累加函数
LongAccumulator accumulator = new LongAccumulator((x, y) -> x + y, 0);
accumulator.accumulate(10);  // 累加

// 求最大值
LongAccumulator max = new LongAccumulator(Long::max, Long.MIN_VALUE);
max.accumulate(100);
max.accumulate(200);
max.accumulate(50);
System.out.println(max.get());  // 200

// 求最小值
LongAccumulator min = new LongAccumulator(Long::min, Long.MAX_VALUE);
min.accumulate(100);
min.accumulate(50);
min.accumulate(200);
System.out.println(min.get());  // 50

6.2 实战案例：统计最大QPS

public class QPSMonitor {
    // 统计当前秒的请求数
    private LongAdder currentSecondCount = new LongAdder();

    // 统计历史最大QPS
    private LongAccumulator maxQPS = new LongAccumulator(Long::max, 0);

    // 记录每秒请求数
    @Scheduled(fixedRate = 1000)  // 每秒执行
    public void recordQPS() {
        long qps = currentSecondCount.sumThenReset();
        maxQPS.accumulate(qps);  // 更新最大QPS

        System.out.println("当前QPS：" + qps);
        System.out.println("历史最大QPS：" + maxQPS.get());
    }

    // 处理请求
    public void handleRequest() {
        currentSecondCount.increment();
    }
}

七、伪共享与@Contended

7.1 什么是伪共享？

// CPU缓存行通常是64字节
// 如果两个变量在同一缓存行，会互相干扰

class Counter {
    volatile long value1;  // 8字节
    volatile long value2;  // 8字节
    // value1和value2可能在同一缓存行（64字节）
}

// 线程1修改value1 → 缓存行失效 → 线程2的value2缓存也失效
// 导致性能下降（虽然两个变量逻辑上独立）

7.2 @Contended解决伪共享

// LongAdder的Cell使用@Contended避免伪共享
@sun.misc.Contended
static final class Cell {
    volatile long value;
}

// @Contended会在value前后填充128字节（缓存行对齐）
// 确保每个Cell独占缓存行，避免伪共享

// 需要开启JVM参数：-XX:-RestrictContended

八、核心要点总结

8.1 LongAdder原理

分段累加：将单一变量拆分成base + Cell数组
动态扩容：根据竞争程度动态增加Cell数量
空间换时间：牺牲空间，换取并发性能
避免伪共享：@Contended确保Cell独占缓存行

8.2 性能对比

并发度	AtomicLong	LongAdder	性能提升
低并发	✅ 更好	相当	1倍
中并发	一般	✅ 更好	3-6倍
高并发	差	✅ 更好	10-16倍

8.3 使用建议

使用AtomicLong：

✅ 需要精确原子操作（如ID生成、库存扣减）
✅ 低并发场景
✅ 对内存敏感

使用LongAdder：

✅ 统计场景（QPS、错误计数、访问量）
✅ 高并发场景
✅ 对性能要求高

使用LongAccumulator：

✅ 需要自定义累加函数（如求最大值、最小值）
✅ 更灵活的聚合操作

总结

LongAdder通过分段累加和避免伪共享实现了高并发场景下的极致性能：

核心优势：

✅ 高并发性能：比AtomicLong快10-16倍（64线程）
✅ 动态扩容：根据竞争程度自动调整Cell数量
✅ 空间换时间：牺牲空间，换取性能

实战建议：

统计场景优先使用LongAdder
精确计数仍使用AtomicLong
自定义聚合使用LongAccumulator
监控Cell数量，评估竞争程度

下一篇预告：我们将深入StampedLock，学习比读写锁更快的乐观读锁机制！

一、AtomicLong的性能瓶颈#

1.1 高并发下的问题#

1.2 LongAdder的解决方案#

二、LongAdder核心原理#

2.1 分段累加思想#

2.2 底层实现#

三、性能对比测试#

3.1 不同并发度下的性能#

四、LongAdder API与使用#

4.1 基本API#

4.2 注意事项#

五、LongAdder vs AtomicLong#

5.1 性能对比#

5.2 使用场景选择#

六、LongAccumulator：更通用的累加器#

6.1 LongAccumulator简介#

6.2 实战案例：统计最大QPS#

七、伪共享与@Contended#

7.1 什么是伪共享？#

7.2 @Contended解决伪共享#

八、核心要点总结#

8.1 LongAdder原理#

8.2 性能对比#

8.3 使用建议#

总结#