.net - 什麼是重寫的System.Object.GetHashCode的最佳算法?




algorithm (11)

ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members .

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}

在.NET中, System.Object.GetHashCode方法在.NET基類庫中的很多地方都有使用。 特別是在快速查找集合中的項目或確定平等時。 是否有一個關於如何為我的自定義類實現GetHashCode覆蓋的標準算法/最佳實踐,所以我不會降低性能?


As of https://github.com/dotnet/coreclr/pull/14863 , there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.


I ran into an issue with floats and decimals using the implementation selected as the answer above.

This test fails (floats; hash is the same even though I switched 2 values to be negative):

        var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m};
        var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

But this test passes (with ints):

        var obj1 = new { A = 100m, B = 100m, C = 100, D = 100};
        var obj2 = new { A = 100m, B = 100m, C = -100, D = -100};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

I changed my implementation to not use GetHashCode for the primitive types and it seems to work better

    private static int InternalComputeHash(params object[] obj)
    {
        unchecked
        {
            var result = (int)SEED_VALUE_PRIME;
            for (uint i = 0; i < obj.Length; i++)
            {
                var currval = result;
                var nextval = DetermineNextValue(obj[i]);
                result = (result * MULTIPLIER_VALUE_PRIME) + nextval;

            }
            return result;
        }
    }



    private static int DetermineNextValue(object value)
    {
        unchecked
        {

                int hashCode;
                if (value is short
                    || value is int
                    || value is byte
                    || value is sbyte
                    || value is uint
                    || value is ushort
                    || value is ulong
                    || value is long
                    || value is float
                    || value is double
                    || value is decimal)
                {
                    return Convert.ToInt32(value);
                }
                else
                {
                    return value != null ? value.GetHashCode() : 0;
                }
        }
    }

Microsoft lead for several way of hashing...

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

I can guess that for multiple big int you can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

And same for multi-type: all converted first to int using GetHashCode() then the int values will be xor'ed and the result is your hash.

For those who use hash as ID (I mean an unique value), hash is naturally limited to a number of digits, I think it was 5 bytes for hashing algorithm, at least MD5.

You may turn multiple values to a hashed value and some of them be same, so don't use it as an identifier. (maybe some day I am going to use your component)


Pretty much similar to nightcoder's solution except it's easier to raise primes if you want to.

PS: This is one of those times where you puke a little in your mouth, knowing that this could be refactored into one method with 9 default's but it would be slower, so you just close your eyes and try to forget about it.

/// <summary>
/// Try not to look at the source code. It works. Just rely on it.
/// </summary>
public static class HashHelper
{
    private const int PrimeOne = 17;
    private const int PrimeTwo = 23;

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();
            hash = hash * PrimeTwo + arg10.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();

            return hash;
        }
    }
}

在Equals()比較多個字段的大多數情況下,如果GetHash()在一個字段或多個字段上散列並不重要。 你只需要確保計算散列非常便宜(請不要分配 ),並且快速( 不需要繁重的計算 ,當然也不需要數據庫連接)並且提供了良好的分佈。

繁重的工作應該是Equals()方法的一部分; 哈希應該是一個非常便宜的操作,以便盡可能少地調用Equals()。

最後一個提示: 不要依賴GetHashCode()在多個應用程序運行中保持穩定 。 許多.Net類型不保證它們的哈希代碼在重新啟動後保持不變,因此您應該只在內存數據結構中使用GetHashCode()的值。


我在幫助程序庫中有一個哈希類,用於此目的。

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

然後,只需將它用作:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

我沒有評估它的表現,因此歡迎任何反饋。


我通常使用Josh Bloch的神話般的 Effective Java中的實現 。 它速度很快,創建了一個不太可能造成衝突的相當不錯的散列。 選擇兩個不同的素數,例如17和23,然後執行:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

正如評論中指出的那樣,您可能會發現最好選擇一個較大的素數來代替。 顯然486187739是好的...儘管大多數我看到的小數字的例子傾向於使用素數,但至少有類似的算法使用非素數。 例如,在後面的不太完整的FNV例子中,我使用了顯然工作得很好的數字 - 但初始值不是主要數據。 (雖然乘法常數主要的,但我不知道這有多重要。)

由於兩個主要原因,這比通常的XOR更好。 假設我們有一個包含兩個int字段的類型:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

順便說一下,早期的算法是C#編譯器當前用於匿名類型的算法。

本頁面提供了很多選項。 我認為在大多數情況下,上述內容“足夠好”,而且記住和正確的記錄非常容易。 FNV替代方法同樣簡單,但使用不同的常量和XOR而不是ADD作為組合操作。 它看起來像下面的代碼,但正常的FNV算法對單個字節進行操作,所以這需要修改每個字節執行一次迭代,而不是每個32位散列值。 FNV也被設計用於可變長度的數據,而我們在這裡使用的方式始終是相同數量的字段值。 對這個答案的評論表明,這裡的代碼實際上並沒有像上面的添加方法那樣工作(在樣本案例中被測試過)。

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

請注意,有一點需要注意的是,理想情況下,您應該在將其添加到依賴哈希代碼的集合後,防止對等同敏感(並因此對哈希碼敏感)狀態發生更改。

根據documentation

您可以為不可變引用類型重寫GetHashCode。 通常,對於可變引用類型,只有在以下情況下才應該覆蓋GetHashCode:

  • 您可以從不可變的字段計算哈希碼; 要么
  • 您可以確保在對象包含在依賴其哈希碼的集合中時,可變對象的哈希碼不會更改。

這個不錯:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

以下是如何使用它:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}

這是我的hashcode助手。
它的優點是它使用泛型類型參數,因此不會導致裝箱:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

它也有提供流暢接口的擴展方法,所以你可以像這樣使用它:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

或者像這樣:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}

這是我簡單化的方法。 我正在使用經典的生成器模式。 它是類型安全的(沒有裝箱/拆箱),也與.NET 2.0兼容(沒有擴展方法等)。

它是這樣使用的:

public override int GetHashCode()
{
    HashBuilder b = new HashBuilder();
    b.AddItems(this.member1, this.member2, this.member3);
    return b.Result;
} 

這裡是acutal builder class:

internal class HashBuilder
{
    private const int Prime1 = 17;
    private const int Prime2 = 23;
    private int result = Prime1;

    public HashBuilder()
    {
    }

    public HashBuilder(int startHash)
    {
        this.result = startHash;
    }

    public int Result
    {
        get
        {
            return this.result;
        }
    }

    public void AddItem<T>(T item)
    {
        unchecked
        {
            this.result = this.result * Prime2 + item.GetHashCode();
        }
    }

    public void AddItems<T1, T2>(T1 item1, T2 item2)
    {
        this.AddItem(item1);
        this.AddItem(item2);
    }

    public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
    }

    public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, 
        T4 item4)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
    }

    public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, 
        T4 item4, T5 item5)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
        this.AddItem(item5);
    }        

    public void AddItems<T>(params T[] items)
    {
        foreach (T item in items)
        {
            this.AddItem(item);
        }
    }
}






gethashcode