[C++] num ++可以是'int num'的原子吗?


Answers

...现在让我们启用优化:

f():
        rep ret

好的,让我们给它一个机会:

void f(int& num)
{
  num = 0;
  num++;
  --num;
  num += 6;
  num -=5;
  --num;
}

结果:

f(int&):
        mov     DWORD PTR [rdi], 0
        ret

另一个观察线程(即使忽略高速缓存同步延迟)没有机会观察个别更改。

相比于:

#include <atomic>

void f(std::atomic<int>& num)
{
  num = 0;
  num++;
  --num;
  num += 6;
  num -=5;
  --num;
}

其结果是:

f(std::atomic<int>&):
        mov     DWORD PTR [rdi], 0
        mfence
        lock add        DWORD PTR [rdi], 1
        lock sub        DWORD PTR [rdi], 1
        lock add        DWORD PTR [rdi], 6
        lock sub        DWORD PTR [rdi], 5
        lock sub        DWORD PTR [rdi], 1
        ret

现在,每个修改是: -

  1. 在另一个线程中可观察到,并且
  2. 尊重其他线程中发生的类似修改。

原子性不仅仅处于指令级别,它涉及从处理器到缓存到内存和后端的整个管道。

更多信息

关于std::atomic atomics更新的优化效果。

c ++标准有'as if'规则,通过这个规则,编译器可以对代码进行重新排序,甚至可以重写代码,前提是结果具有完全相同的可观察效果(包括副作用),就好像它简单地执行了码。

as-if规则是保守的,特别涉及原子。

考虑:

void incdec(int& num) {
    ++num;
    --num;
}

因为没有互斥锁,原子或影响线程间测序的任何其他构造,所以我会争辩说编译器可以自由地将此函数重写为NOP,例如:

void incdec(int&) {
    // nada
}

这是因为在c ++内存模型中,另一个线程不可能观察增量结果。 如果numvolatile (可能会影响硬件行为),这当然会有所不同。 但是在这种情况下,这个函数将是修改这个内存的唯一函数(否则这个程序是不合格的)。

但是,这是一个不同的球赛:

void incdec(std::atomic<int>& num) {
    ++num;
    --num;
}

num是一个原子。 对其进行的更改必须对其他正在观看的线程可见。 改变这些线程本身的作用(例如在增量和减量之间将值设置为100)将对num的最终值产生非常深远的影响。

这里是一个演示:

#include <thread>
#include <atomic>

int main()
{
    for (int iter = 0 ; iter < 20 ; ++iter)
    {
        std::atomic<int> num = { 0 };
        std::thread t1([&] {
            for (int i = 0 ; i < 10000000 ; ++i)
            {
                ++num;
                --num;
            }
        });
        std::thread t2([&] {
            for (int i = 0 ; i < 10000000 ; ++i)
            {
                num = 100;
            }
        });

        t2.join();
        t1.join();
        std::cout << num << std::endl;
    }
}

样本输出:

99
99
99
99
99
100
99
99
100
100
100
100
99
99
100
99
99
100
100
99
Question

通常,对于int numnum++ (或++num )作为读取 - 修改 - 写入操作不是原子的 。 但我经常看到编译器,例如GCC ,为它生成以下代码(请在此尝试 ):

由于对应于num++第5行是一条指令,我们可以得出结论:在这种情况下num++ 是原子吗?

如果是这样, 是不是意味着这样生成的num++可以在并发(多线程)场景中使用,而没有数据竞争的危险 (即,我们不需要创建它,例如, std::atomic<int>并强加相关的成本,因为它原子无论如何)?

UPDATE

请注意,这个问题并不是增量是否原子的(这不是和那是问题的开场白线)。 它是否可以在特定情况下,即在某些情况下是否可以利用单指令性质来避免lock前缀的开销。 而且,正如接受的答案在关于单处理器机器的部分中提到的那样,以及这个答案 ,在其评论和其他人的对话中解释说, 它可以 (尽管不是用C或C ++)。




When your compiler uses only a single instruction for the increment and your machine is single-threaded, your code is safe. ^^




Back in the day when x86 computers had one CPU, the use of a single instruction ensured that interrupts would not split the read/modify/write and if the memory would not be used as a DMA buffer too, it was atomic in fact (and C++ did not mention threads in the standard so this wasn't addresses).

When it was rare to have a dual core (Pentium Pro) on a customer desktop, I effectively used this to avoid the LOCK prefix on a single core machine and improve performance.

Today, it would only help against multiple threads that were all set to the same CPU affinity, so the threads you are worried about would only come into play via time slice expiring and running the other thread on the same CPU (core). That is not realistic.

With modern x86/x64 processors, the single instruction is broken up into several micro ops and furthermore the memory reading and writing is buffered. So different threads running on different CPUs will not only see this as non-atomic but may see inconsistent results concerning what it reads from memory and what it assumes other threads have read to that point in time: you need to add memory fenses to restore sane behavior.




即使你的编译器总是把它作为一个原子操作发出,但根据C ++ 11和C ++ 14标准,并发访问任何其他线程的num将构成一个数据竞争,并且程序会有未定义的行为。

但比这更糟糕。 首先,如前所述,编译器在增加变量时生成的指令可能取决于优化级别。 其次,如果num不是原子的,编译器可能会重新排序++num 以外的其他内存访问,例如

int main()
{
  std::unique_ptr<std::vector<int>> vec;
  int ready = 0;
  std::thread t{[&]
    {
       while (!ready);
       // use "vec" here
    });
  vec.reset(new std::vector<int>());
  ++ready;
  t.join();
}

即使我们乐观地认为++ready是“原子的”,并且编译器根据需要生成检查循环(正如我所说的那样,它是UB,因此编译器可以自由删除它,用无限循环替换它等。 ),编译器可能仍会移动指针分配,或者甚至更糟糕的是,将vector初始化为增量操作之后的某个点,导致新线程出现混乱。 在实践中,如果优化编译器完全移除ready变量和检查循环,我不会感到惊讶,因为这不会影响语言规则下的可观察行为(与您的私人希望相反)。

事实上,在去年的Meeting C ++会议上,我听到两位编译器开发人员的发言,他们非常乐意实现优化,只要语言规则允许,即使性能有小幅改进,也能让天真写入的多线程程序行为失常在正确编写的程序中。

最后,即使你不关心可移植性,并且你的编译器很神奇,你使用的CPU很可能是超标量CISC类型,并且会将指令分解为微操作,重新排序和/或推测性地执行它们,在某种程度上仅受限于同步诸如(在Intel上) LOCK前缀或内存隔离等基元,以便最大化每秒操作。

长话短说,线程安全编程的自然职责是:

  1. 您的责任是编写在语言规则下具有明确定义的行为的代码(特别是语言标准内存模型)。
  2. 您的编译器的职责是生成机器代码,该代码在目标架构的内存模型下具有相同的良好定义(可观察)的行为。
  3. 您的CPU的职责是执行此代码,以便观察到的行为与其自身架构的内存模型兼容。

如果您想以自己的方式进行操作,可能只是在某些情况下可行,但要明白保修是无效的,您将对任何不想要的结果承担全部责任。 :-)

PS:正确书写的例子:

int main()
{
  std::unique_ptr<std::vector<int>> vec;
  std::atomic<int> ready{0}; // NOTE the use of the std::atomic template
  std::thread t{[&]
    {
       while (!ready);
       // use "vec" here
    });
  vec.reset(new std::vector<int>());
  ++ready;
  t.join();
}

这是安全的,因为:

  1. 根据语言规则, ready的检查不能被优化。
  2. ++ready - 在检查ready不为零之前,其他操作不能在这些操作周围进行重新排序。 这是因为++ready和检查顺序一致 ,这是C ++内存模型中描述的另一个术语,它禁止这种特定的重新排序。 因此,编译器不得重新排序指令,并且还必须告诉CPU它不得在递增ready后延迟写入vec顺序一致是语言标准中关于原子的最强保证。 通过std::atomic<T>等其他方法可以获得更少的(并且理论上更便宜的)保证,但这些对于专家而言绝对是唯一的,并且编译器开发人员可能不会优化太多,因为它们很少被使用。



添加指令不是原子的。 它引用内存,并且两个处理器内核可能具有该内存的不同本地缓存。

IIRC add指令的原子变体称为锁定xadd




对,但是...

Atomic is not what you meant to say. You're probably asking the wrong thing.

The increment is certainly atomic . Unless the storage is misaligned (and since you left alignment to the compiler, it is not), it is necessarily aligned within a single cache line. Short of special non-caching streaming instructions, each and every write goes through the cache. Complete cache lines are being atomically read and written, never anything different.
Smaller-than-cacheline data is, of course, also written atomically (since the surrounding cache line is).

Is it thread-safe?

This is a different question, and there are at least two good reasons to answer with a definite "No!"

First, there is the possibility that another core might have a copy of that cache line in L1 (L2 and upwards is usually shared, but L1 is normally per-core!), and concurrently modifies that value. Of course that happens atomically, too, but now you have two "correct" (correctly, atomically, modified) values -- which one is the truly correct one now?
The CPU will sort it out somehow, of course. But the result may not be what you expect.

Second, there is memory ordering, or worded differently happens-before guarantees. The most important thing about atomic instructions is not so much that they are atomic . It's ordering.

You have the possibility of enforcing a guarantee that everything that happens memory-wise is realized in some guaranteed, well-defined order where you have a "happened before" guarantee. This ordering may be as "relaxed" (read as: none at all) or as strict as you need.

For example, you can set a pointer to some block of data (say, the results of some calculation) and then atomically release the "data is ready" flag. Now, whoever acquires this flag will be led into thinking that the pointer is valid. And indeed, it will always be a valid pointer, never anything different. That's because the write to the pointer happened-before the atomic operation.