c++ - boost :: hash_combine中的幻数





algorithm magic-numbers (3)


幻数应该是32个随机比特,每个比特可能是0或1,并且这些比特之间没有简单的相关性。 找到一串这样的比特的常用方法是使用无理数的二进制扩展; 在这种情况下,这个数字是黄金比例的倒数:

phi = (1 + sqrt(5)) / 2
2^32 / phi = 0x9e3779b9

所以包含这个数字“随机”改变种子的每一位; 正如你所说,这意味着连续的值将会相距甚远。 包括旧种子的移位版本可以确保即使hash_value()的值范围相当小,差异也很快会在所有比特中传播。

boost::hash_combine模板函数接受一个散列(称为seed )和一个对象v的引用。 根据docs ,它将seedv的散列结合在一起

seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);

我可以看到这是确定性的。 我明白为什么使用XOR。

我敢打赌,这种增加有助于将相似的值映射到很远的地方,所以探测哈希表不会中断,但是有人能解释一下这个魔术常量是什么吗?




看看1997年鲍勃詹金斯DDJ文章 。 魔术常数(“黄金比例”)解释如下:

黄金比例确实是一个任意值。 其目的是避免将全零映射到全零。




You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions . The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n , the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.

You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)) . Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.

How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n) . It evaluates to 0 if you put in any of the numbers x_i . Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n . The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.

So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.

Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100! ), we can do these calculations mod p where p is a prime bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to 0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k , not N , we have to factor the polynomial mod p .







c++ algorithm boost hash magic-numbers