# java - 为什么处理排序数组比处理未排序数组更快？

java c++ performance optimization branch-prediction

``````#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];

for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;

// !!! With this, the next loop runs faster
std::sort(data, data + arraySize);

// Test
clock_t start = clock();
long long sum = 0;

for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

std::cout << elapsedTime << std::endl;
std::cout << "sum = " << sum << std::endl;
}
``````
• 没有`std::sort(data, data + arraySize);` ，代码运行时间为11.54秒。
• 使用排序数据，代码运行1.93秒。

``````import java.util.Arrays;
import java.util.Random;

public class Main
{
public static void main(String[] args)
{
// Generate data
int arraySize = 32768;
int data[] = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.nextInt() % 256;

// !!! With this, the next loop runs faster
Arrays.sort(data);

// Test
long start = System.nanoTime();
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

System.out.println((System.nanoTime() - start) / 1000000000.0);
System.out.println("sum = " + sum);
}
}
``````

• 到底是怎么回事？
• 为什么处理排序数组比处理未排序数组更快？
• 代码总结了一些独立的术语，顺序无关紧要。

``````for (unsigned i = 0; i < 100000; ++i)
{
for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
sum += data[j];
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
for (unsigned i = 0; i < 100000; ++i)
{
if (data[j] >= 128)
sum += data[j];
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
for (unsigned i = 0; i < 100000; ++i)
{
sum += data[j];
}
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
sum += data[j] * 100000;
}
}
``````

1. 它是一个小桌子，可能会被缓存在处理器中
2. 您正在以非常紧凑的循环运行和/或处理器可以预加载数据

Pfew，那到底是什么意思呢？

``````for (int i=0; i<array.Length; ++i)
// Use array[i]
``````

``````// Generate data
int arraySize = 32768;
int[] data = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.Next(256);

//To keep the spirit of the code in-tact I'll make a separate lookup table
// (I assume we cannot modify 'data' or the number of loops)
int[] lookup = new int[256];

for (int c = 0; c < 256; ++c)
lookup[c] = (c >= 128) ? c : 0;

// Test
DateTime startTime = System.DateTime.Now;
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int j = 0; j < arraySize; ++j)
{
// Here you basically want to use simple operations - so no
// random branches, but things like &, |, *, -, +, etc. are fine.
sum += lookup[data[j]];
}
}

DateTime endTime = System.DateTime.Now;
Console.WriteLine(endTime - startTime);
Console.WriteLine("sum = " + sum);

``````

``````// Test
clock_t start = clock();
long long a[] = {0, 0};
long long sum;

for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
int j = (data[c] >> 7);
a[j] += data[c];
}
}

double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;
sum = a[1];
``````

``````// declare and then fill in the lookup table
int lut[256];
for (unsigned c = 0; c < 256; ++c)
lut[c] = (c >= 128) ? c : 0;

// use the lookup table after it is built
for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
sum += lut[data[c]];
}
}
``````

``````if (x < node->value)
node = node->pLeft;
else
node = node->pRight;
``````

``````i = (x < node->value);
``````

1. IF - 从内存中获取指令
2. ID - 解码指令
3. EX - 执行指令
4. WB - 写回CPU寄存器

4级管道一般用于2条指令。

``````                        A) if (data[c] >= 128)
/\
/  \
/    \
true /      \ false
/        \
/          \
/            \
/              \
B) sum += data[c];          C) for loop or print().
``````

1. 所有元素都小于128
2. 所有元素都大于128
3. 一些起始新元素小于128，后来大于128

``````if (likely( everything_is_ok ))
{
/* Do something */
}
``````

``````if (unlikely(very_improbable_condition))
{
/* Do something */
}
``````

1.静态

2.动态

``````if (expression)
{
// Run 1
} else {
// Run 2
}
``````

### 可视化：

`````` O      Route 1  /-------------------------------
/|\             /
|  ---------##/
/ \            \
\
Route 2  \--------------------------------
``````

• 分支预测器是古老的性能改进技术之一，它仍然与现代建筑相关。虽然简单的预测技术提供快速查找和功率效率，但是它们具有高的误预测率。

• 另一方面，复杂的分支预测 - 无论是基于神经的还是两级分支预测的变体 - 提供更好的预测准确性，但它们消耗更多的功率和复杂性以指数方式增加。

• 除此之外，在复杂的预测技术中，预测分支所花费的时间本身非常高，从2到5个周期 - 这与实际分支的执行时间相当。

• 分支预测本质上是一种优化（最小化）问题，其中重点在于以最少的资源实现最低可能的未命中率，低功耗和低复杂度。

`````` // sort backwards (higher values first)
std::sort(data, data + arraySize, std::greater<int>());

for (unsigned c = 0; c < arraySize; ++c) {
if (data[c] < 128) {
break;
}
sum += data[c];
}
``````

### Tags

java   c++   performance   optimization   branch-prediction