.net - gethashcode实现 - 什么是重写的System.Object.GetHashCode的最佳算法?




c# gethashcode实现 (11)

在.NET中, System.Object.GetHashCode方法在.NET基类库中的很多地方都有使用。 特别是在快速查找集合中的项目或确定平等时。 是否有一个关于如何为我的自定义类实现GetHashCode覆盖的标准算法/最佳实践,所以我不会降低性能?


ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members .

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}

As of https://github.com/dotnet/coreclr/pull/14863 , there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.


I ran into an issue with floats and decimals using the implementation selected as the answer above.

This test fails (floats; hash is the same even though I switched 2 values to be negative):

        var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m};
        var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

But this test passes (with ints):

        var obj1 = new { A = 100m, B = 100m, C = 100, D = 100};
        var obj2 = new { A = 100m, B = 100m, C = -100, D = -100};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

I changed my implementation to not use GetHashCode for the primitive types and it seems to work better

    private static int InternalComputeHash(params object[] obj)
    {
        unchecked
        {
            var result = (int)SEED_VALUE_PRIME;
            for (uint i = 0; i < obj.Length; i++)
            {
                var currval = result;
                var nextval = DetermineNextValue(obj[i]);
                result = (result * MULTIPLIER_VALUE_PRIME) + nextval;

            }
            return result;
        }
    }



    private static int DetermineNextValue(object value)
    {
        unchecked
        {

                int hashCode;
                if (value is short
                    || value is int
                    || value is byte
                    || value is sbyte
                    || value is uint
                    || value is ushort
                    || value is ulong
                    || value is long
                    || value is float
                    || value is double
                    || value is decimal)
                {
                    return Convert.ToInt32(value);
                }
                else
                {
                    return value != null ? value.GetHashCode() : 0;
                }
        }
    }

Microsoft lead for several way of hashing...

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

I can guess that for multiple big int you can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

And same for multi-type: all converted first to int using GetHashCode() then the int values will be xor'ed and the result is your hash.

For those who use hash as ID (I mean an unique value), hash is naturally limited to a number of digits, I think it was 5 bytes for hashing algorithm, at least MD5.

You may turn multiple values to a hashed value and some of them be same, so don't use it as an identifier. (maybe some day I am going to use your component)


Pretty much similar to nightcoder's solution except it's easier to raise primes if you want to.

PS: This is one of those times where you puke a little in your mouth, knowing that this could be refactored into one method with 9 default's but it would be slower, so you just close your eyes and try to forget about it.

/// <summary>
/// Try not to look at the source code. It works. Just rely on it.
/// </summary>
public static class HashHelper
{
    private const int PrimeOne = 17;
    private const int PrimeTwo = 23;

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();
            hash = hash * PrimeTwo + arg10.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();

            return hash;
        }
    }
}

在Equals()比较多个字段的大多数情况下,如果GetHash()在一个字段或多个字段上散列并不重要。 你只需要确保计算散列非常便宜(请不要分配 ),并且快速( 不需要繁重的计算 ,当然也没有数据库连接)并且提供了良好的分布。

繁重的工作应该是Equals()方法的一部分; 哈希应该是一个非常便宜的操作,以便尽可能少地调用Equals()。

最后一个提示: 不要依赖GetHashCode()在多个应用程序运行中保持稳定 。 许多.Net类型不保证它们的哈希代码在重新启动后保持不变,因此您应该只在内存数据结构中使用GetHashCode()的值。


我在帮助程序库中有一个哈希类,用于此目的。

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

然后,只需将它用作:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

我没有评估它的表现,因此欢迎任何反馈。


我通常使用Josh Bloch的神话般的 Effective Java中的实现 。 它速度很快,创建了一个不太可能造成冲突的相当不错的散列。 选择两个不同的素数,例如17和23,然后执行:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

正如评论中指出的那样,您可能会发现最好选择一个大质数来代替。 显然486187739是好的...尽管大多数我看到的小数字的例子倾向于使用素数,但至少有类似的算法使用非素数。 例如,在后面的不太完全的FNV例子中,我使用了显然工作得很好的数字 - 但初始值并不是主要数据。 (虽然乘法常数主要的,但我不知道这有多重要。)

由于两个主要原因,这比通常的XOR更好。 假设我们有一个包含两个int字段的类型:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

顺便说一下,早期的算法是C#编译器当前用于匿名类型的算法。

本页面提供了很多选项。 我认为在大多数情况下,上述内容“足够好”,而且记住并且正确无误是非常容易的。 FNV替代方法同样简单,但使用不同的常量和XOR而不是ADD作为组合操作。 它看起来像下面的代码,但正常的FNV算法对单个字节进行操作,所以这需要修改每个字节执行一次迭代,而不是每个32位散列值。 FNV也被设计用于可变长度的数据,而我们在这里使用的方式始终是相同数量的字段值。 对这个答案的评论表明,这里的代码实际上并没有像上面的添加方法那样工作(在样本案例中被测试过)。

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

请注意,有一点需要注意的是,理想情况下,您应该在将其添加到依赖哈希代码的集合后,防止对等同敏感(并因此对哈希码敏感)状态发生更改。

根据documentation

您可以为不可变的引用类型重写GetHashCode。 通常,对于可变引用类型,只有在以下情况下才应该覆盖GetHashCode:

  • 您可以从不可变的字段计算哈希码; 要么
  • 您可以确保在对象包含在依赖其哈希码的集合中时,可变对象的哈希码不会更改。

这个不错:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

以下是如何使用它:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}

这是我的hashcode助手。
它的优点是它使用泛型类型参数,因此不会导致装箱:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

它也有提供流畅接口的扩展方法,所以你可以像这样使用它:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

或者像这样:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}

这是我简单化的方法。 我正在使用经典的生成器模式。 它是类型安全的(没有装箱/拆箱),也与.NET 2.0兼容(没有扩展方法等)。

它是这样使用的:

public override int GetHashCode()
{
    HashBuilder b = new HashBuilder();
    b.AddItems(this.member1, this.member2, this.member3);
    return b.Result;
} 

这里是acutal builder class:

internal class HashBuilder
{
    private const int Prime1 = 17;
    private const int Prime2 = 23;
    private int result = Prime1;

    public HashBuilder()
    {
    }

    public HashBuilder(int startHash)
    {
        this.result = startHash;
    }

    public int Result
    {
        get
        {
            return this.result;
        }
    }

    public void AddItem<T>(T item)
    {
        unchecked
        {
            this.result = this.result * Prime2 + item.GetHashCode();
        }
    }

    public void AddItems<T1, T2>(T1 item1, T2 item2)
    {
        this.AddItem(item1);
        this.AddItem(item2);
    }

    public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
    }

    public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, 
        T4 item4)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
    }

    public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, 
        T4 item4, T5 item5)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
        this.AddItem(item5);
    }        

    public void AddItems<T>(params T[] items)
    {
        foreach (T item in items)
        {
            this.AddItem(item);
        }
    }
}






gethashcode