# java - 為什麼處理排序數組比處理未排序數組更快？

Question

``````#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];

for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;

// !!! With this, the next loop runs faster
std::sort(data, data + arraySize);

// Test
clock_t start = clock();
long long sum = 0;

for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

std::cout << elapsedTime << std::endl;
std::cout << "sum = " << sum << std::endl;
}
``````
• 沒有`std::sort(data, data + arraySize);` ，代碼運行時間為11.54秒。
• 使用排序數據，代碼運行1.93秒。

``````import java.util.Arrays;
import java.util.Random;

public class Main
{
public static void main(String[] args)
{
// Generate data
int arraySize = 32768;
int data[] = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.nextInt() % 256;

// !!! With this, the next loop runs faster
Arrays.sort(data);

// Test
long start = System.nanoTime();
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

System.out.println((System.nanoTime() - start) / 1000000000.0);
System.out.println("sum = " + sum);
}
}
``````

• 到底是怎麼回事？
• 為什麼處理排序數組比處理未排序數組更快？
• 代碼總結了一些獨立的術語，順序無關緊要。

``````for (unsigned i = 0; i < 100000; ++i)
{
for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
sum += data[j];
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
for (unsigned i = 0; i < 100000; ++i)
{
if (data[j] >= 128)
sum += data[j];
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
for (unsigned i = 0; i < 100000; ++i)
{
sum += data[j];
}
}
}
``````

``````for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
sum += data[j] * 100000;
}
}
``````

1. 它是一個小桌子，可能會被緩存在處理器中
2. 您正在以非常緊湊的循環運行和/或處理器可以預加載數據

Pfew，那到底是什麼意思呢？

``````for (int i=0; i<array.Length; ++i)
// Use array[i]
``````

``````// Generate data
int arraySize = 32768;
int[] data = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.Next(256);

//To keep the spirit of the code in-tact I'll make a separate lookup table
// (I assume we cannot modify 'data' or the number of loops)
int[] lookup = new int[256];

for (int c = 0; c < 256; ++c)
lookup[c] = (c >= 128) ? c : 0;

// Test
DateTime startTime = System.DateTime.Now;
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int j = 0; j < arraySize; ++j)
{
// Here you basically want to use simple operations - so no
// random branches, but things like &, |, *, -, +, etc. are fine.
sum += lookup[data[j]];
}
}

DateTime endTime = System.DateTime.Now;
Console.WriteLine(endTime - startTime);
Console.WriteLine("sum = " + sum);

``````

``````// Test
clock_t start = clock();
long long a[] = {0, 0};
long long sum;

for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
int j = (data[c] >> 7);
a[j] += data[c];
}
}

double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;
sum = a[1];
``````

``````// declare and then fill in the lookup table
int lut[256];
for (unsigned c = 0; c < 256; ++c)
lut[c] = (c >= 128) ? c : 0;

// use the lookup table after it is built
for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
sum += lut[data[c]];
}
}
``````

``````if (x < node->value)
node = node->pLeft;
else
node = node->pRight;
``````

``````i = (x < node->value);
``````

1. IF - 從內存中獲取指令
2. ID - 解碼指令
3. EX - 執行指令
4. WB - 寫回CPU寄存器

4級管道一般用於2條指令。

``````                        A) if (data[c] >= 128)
/\
/  \
/    \
true /      \ false
/        \
/          \
/            \
/              \
B) sum += data[c];          C) for loop or print().
``````

1. 所有元素都小於128
2. 所有元素都大於128
3. 一些起始新元素小於128，後來大於128

``````if (likely( everything_is_ok ))
{
/* Do something */
}
``````

``````if (unlikely(very_improbable_condition))
{
/* Do something */
}
``````

1.靜態

2.動態

``````if (expression)
{
// Run 1
} else {
// Run 2
}
``````

### 可視化：

`````` O      Route 1  /-------------------------------
/|\             /
|  ---------##/
/ \            \
\
Route 2  \--------------------------------
``````

• 分支預測器是古老的性能改進技術之一，它仍然與現代建築相關。雖然簡單的預測技術提供快速查找和功率效率，但是它們具有高的誤預測率。

• 另一方面，複雜的分支預測 - 無論是基於神經的還是兩級分支預測的變體 - 提供更好的預測準確性，但它們消耗更多的功率和復雜性以指數方式增加。

• 除此之外，在復雜的預測技術中，預測分支所花費的時間本身非常高，從2到5個週期 - 這與實際分支的執行時間相當。

• 分支預測本質上是一種優化（最小化）問題，其中重點在於以最少的資源實現最低可能的未命中率，低功耗和低複雜度。

`````` // sort backwards (higher values first)
std::sort(data, data + arraySize, std::greater<int>());

for (unsigned c = 0; c < arraySize; ++c) {
if (data[c] < 128) {
break;
}
sum += data[c];
}
``````

### Tags

java   c++   performance   optimization   branch-prediction