# java - 為什麼處理排序數組比處理未排序數組更快?

## c++ performance (14)

``````#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];

for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;

// !!! With this, the next loop runs faster
std::sort(data, data + arraySize);

// Test
clock_t start = clock();
long long sum = 0;

for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

std::cout << elapsedTime << std::endl;
std::cout << "sum = " << sum << std::endl;
}
``````
• 沒有`std::sort(data, data + arraySize);` ，代碼運行時間為11.54秒。
• 使用排序數據，代碼運行1.93秒。

``````import java.util.Arrays;
import java.util.Random;

public class Main
{
public static void main(String[] args)
{
// Generate data
int arraySize = 32768;
int data[] = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.nextInt() % 256;

// !!! With this, the next loop runs faster
Arrays.sort(data);

// Test
long start = System.nanoTime();
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}

System.out.println((System.nanoTime() - start) / 1000000000.0);
System.out.println("sum = " + sum);
}
}
``````

• 到底是怎麼回事？
• 為什麼處理排序數組比處理未排序數組更快？
• 代碼總結了一些獨立的術語，順序無關緊要。

``````if (expression)
{
// Run 1
} else {
// Run 2
}
``````

### 可視化：

`````` O      Route 1  /-------------------------------
/|\             /
|  ---------##/
/ \            \
\
Route 2  \--------------------------------
``````

``````if (likely( everything_is_ok ))
{
/* Do something */
}
``````

``````if (unlikely(very_improbable_condition))
{
/* Do something */
}
``````

``````int i= 0, j, k= arraySize;
while (i < k)
{
j= (i + k) >> 1;
if (data[j] >= 128)
k= j;
else
i= j;
}
sum= 0;
for (; i < arraySize; i++)
sum+= data[i];
``````

``````int i, k, j= (i + k) >> 1;
for (i= 0, k= arraySize; i < k; (data[j] >= 128 ? k : i)= j)
j= (i + k) >> 1;
for (sum= 0; i < arraySize; i++)
sum+= data[i];
``````

1. 英特爾 - 避免分支機構錯誤預測的成本
2. 英特爾 - 分支和循環重組以防止錯誤預測
3. 科學論文 - 分支預測計算機結構
4. 書籍：JL Hennessy，DA Patterson：計算機體系結構：定量方法
5. 科學出版物中的文章：TY Yeh，YN Patt在分支預測中做了很多這些。

``````data[c] = std::rand() % 256;
``````

1. 它是一個小桌子，可能會被緩存在處理器中
2. 您正在以非常緊湊的循環運行和/或處理器可以預加載數據

Pfew，那到底是什麼意思呢？

``````for (int i=0; i<array.Length; ++i)
// Use array[i]
``````

``````// Generate data
int arraySize = 32768;
int[] data = new int[arraySize];

Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data[c] = rnd.Next(256);

//To keep the spirit of the code in-tact I'll make a separate lookup table
// (I assume we cannot modify 'data' or the number of loops)
int[] lookup = new int[256];

for (int c = 0; c < 256; ++c)
lookup[c] = (c >= 128) ? c : 0;

// Test
DateTime startTime = System.DateTime.Now;
long sum = 0;

for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int j = 0; j < arraySize; ++j)
{
// Here you basically want to use simple operations - so no
// random branches, but things like &, |, *, -, +, etc. are fine.
sum += lookup[data[j]];
}
}

DateTime endTime = System.DateTime.Now;
Console.WriteLine(endTime - startTime);
Console.WriteLine("sum = " + sum);

``````

1. IF - 從內存中獲取指令
2. ID - 解碼指令
3. EX - 執行指令
4. WB - 寫回CPU寄存器

4級管道一般用於2條指令。

``````                        A) if (data[c] >= 128)
/\
/  \
/    \
true /      \ false
/        \
/          \
/            \
/              \
B) sum += data[c];          C) for loop or print().
``````

1. 所有元素都小於128
2. 所有元素都大於128
3. 一些起始新元素小於128，後來大於128

``````____________________________________________________________________________________
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
``````

``````______   ________
|     |__|
``````

``````___________________________________________ Straight road
``````

``````if (data[c] >= 128)
sum += data[c];
``````

`if`-statement 的表現取決於其條件是否具有可預測的模式。如果條件始終為真或始終為假，則處理器中的分支預測邏輯將獲取模式。另一方面，如果模式不可預測，那麼 - `if`陳述將更加昂貴。

``````for (int i = 0; i < max; i++)
if (condition)
sum++;
``````

``````Condition            Pattern                 Time (ms)

(i & 0×80000000) == 0    T repeated          322

(i & 0xffffffff) == 0    F repeated          276

(i & 1) == 0            TF alternating    760

(i & 3) == 0            TFFFTFFF…          513

(i & 2) == 0            TTFFTTFF…          1675

(i & 4) == 0            TTTTFFFFTTTTFFFF… 1275

(i & 8) == 0            8T 8F 8T 8F …     752

(i & 16) == 0            16T 16F 16T 16F … 490
``````

1.靜態

2.動態

`````` // sort backwards (higher values first)
std::sort(data, data + arraySize, std::greater<int>());

for (unsigned c = 0; c < arraySize; ++c) {
if (data[c] < 128) {
break;
}
sum += data[c];
}
``````

``````MOV R0, #0     // R0 = sum = 0
MOV R1, #0     // R1 = c = 0
ADR R2, data   // R2 = addr of data array (put this instruction outside outer loop)
.inner_loop    // Inner loop branch label
LDRB R3, [R2, R1]     // R3 = data[c]
CMP R3, #128          // compare R3 to 128
ADDGE R0, R0, R3      // if R3 >= 128, then sum += data[c] -- no branch needed!
ADD R1, R1, #1        // c++
CMP R1, #arraySize    // compare c to arraySize
BLT inner_loop        // Branch to inner_loop if c < arraySize
``````

1. 沒有分支預測器。

1. 使用分支預測器，不要進行條件跳轉。讓我們假設預測沒有採取條件跳躍。

1. 使用分支預測器並進行條件跳轉。讓我們假設預測沒有採取條件跳躍。