[C++] Fastest way to determine if an integer is between two integers (inclusive) with known sets of values


It depends on how many times you want to perform the test over the same data.

If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm.

If you are doing this for a very finite set of values, then you could create a lookup table. Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up.

For your data the lookup table would be 128^3 = 2,097,152. If you can control one of the three variables so you consider all instances where start = N at one time, then the size of the working set drops down to 128^2 = 16432 bytes, which should fit well in most modern caches.

You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons.


Is there a faster way than x >= start && x <= end in C or C++ to test if an integer is between two integers?

UPDATE: My specific platform is iOS. This is part of a box blur function that restricts pixels to a circle in a given square.

UPDATE: After trying the accepted answer, I got an order of magnitude speedup on the one line of code over doing it the normal x >= start && x <= end way.

UPDATE: Here is the after and before code with assembler from XCode:


// diff = (end - start) + 1
#define POINT_IN_RANGE_AND_INCREMENT(p, range) ((p++ - range.start) < range.diff)

 ldr    r0, [sp, #176] @ 4-byte Reload
 ldr    r1, [sp, #164] @ 4-byte Reload
 ldr    r0, [r0]
 ldr    r1, [r1]
 sub.w  r0, r9, r0
 cmp    r0, r1
 blo    LBB44_30


#define POINT_IN_RANGE_AND_INCREMENT(p, range) (p <= range.end && p++ >= range.start)

 ldr    r1, [sp, #172] @ 4-byte Reload
 ldr    r1, [r1]
 cmp    r0, r1
 bls    LBB44_32
 mov    r6, r0
 b      LBB44_33
 ldr    r1, [sp, #188] @ 4-byte Reload
 adds   r6, r0, #1
 ldr    r1, [r1]
 cmp    r0, r1
 bhs    LBB44_36

Pretty amazing how reducing or eliminating branching can provide such a dramatic speed up.

This answer is to report on a testing done with the accepted answer. I performed a closed range test on a large vector of sorted random integer and to my surprise the basic method of ( low <= num && num <= high) is in fact faster than the accepted answer above! Test was done on HP Pavilion g6 (AMD A6-3400APU with 6GB ram. Here's the core code used for testing:

int num = rand();  // num to compare in consecutive ranges.
chrono::time_point<chrono::system_clock> start, end;
auto start = chrono::system_clock::now();

int inBetween1{ 0 };
for (int i = 1; i < MaxNum; ++i)
    if (randVec[i - 1] <= num && num <= randVec[i])
auto end = chrono::system_clock::now();
chrono::duration<double> elapsed_s1 = end - start;

compared with the following which is the accepted answer above:

int inBetween2{ 0 };
for (int i = 1; i < MaxNum; ++i)
    if (static_cast<unsigned>(num - randVec[i - 1]) <= (randVec[i] - randVec[i - 1]))

Pay attention that randVec is a sorted vector. For any size of MaxNum the first method beats the second one on my machine!