[.net] Was ist der beste Algorithmus für ein überschriebenes System.Object.GetHashCode?


7 Answers

Microsoft bietet bereits einen guten generischen HashCode-Generator: Kopieren Sie einfach Ihre Eigenschaften / Feldwerte in einen anonymen Typ und hashen Sie es:

new { PropA, PropB, PropC, PropD }.GetHashCode();

Dies funktioniert für eine beliebige Anzahl von Eigenschaften. Es verwendet keine Boxen oder zusätzliche Ressourcen. Es verwendet nur den Algorithmus, der bereits im Framework für anonyme Typen implementiert wurde.

Question

In .NET System.Object.GetHashCode Methode wird an vielen Stellen in den .NET-Basisklassenbibliotheken verwendet. Vor allem, wenn Sie Gegenstände in einer Sammlung schnell finden oder Gleichheit feststellen wollen. Gibt es einen Standardalgorithmus / Best Practice zum Implementieren der GetHashCode Überschreibung für meine benutzerdefinierten Klassen, damit ich die Leistung nicht beeinträchtige?




Ich habe eine Hashing-Klasse in der Helper-Bibliothek, die ich für diesen Zweck verwende.

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

Dann können Sie es einfach als:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

Ich habe seine Leistung nicht bewertet, daher ist jede Rückmeldung willkommen.




Das ist ein guter:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

Und hier ist, wie man es benutzt:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}



Here is another fluent implementation of the algorithm posted above by Jon Skeet , but which includes no allocations or boxing operations:

public static class Hash
{
    public const int Base = 17;

    public static int HashObject(this int hash, object obj)
    {
        unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); }
    }

    public static int HashValue<T>(this int hash, T value)
        where T : struct
    {
        unchecked { return hash * 23 + value.GetHashCode(); }
    }
}

Verwendung:

public class MyType<T>
{
    public string Name { get; set; }

    public string Description { get; set; }

    public int Value { get; set; }

    public IEnumerable<T> Children { get; set; }

    public override int GetHashCode()
    {
        return Hash.Base
            .HashObject(this.Name)
            .HashObject(this.Description)
            .HashValue(this.Value)
            .HashObject(this.Children);
    }
}

The compiler will ensure HashValue is not called with a class due to the generic type constraint. But there is no compiler support for HashObject since adding a generic argument also adds a boxing operation.




Microsoft lead for several way of hashing...

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

I can guess that for multiple big int you can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

And same for multi-type: all converted first to int using GetHashCode() then the int values will be xor'ed and the result is your hash.

For those who use hash as ID (I mean an unique value), hash is naturally limited to a number of digits, I think it was 5 bytes for hashing algorithm, at least MD5.

You may turn multiple values to a hashed value and some of them be same, so don't use it as an identifier. (maybe some day I am going to use your component)




As of https://github.com/dotnet/coreclr/pull/14863 , there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.




In den meisten Fällen, in denen Equals () mehrere Felder vergleicht, spielt es keine Rolle, ob GetHash () Hashes für ein Feld oder für viele verwendet. Sie müssen nur sicherstellen, dass die Hash-Berechnung wirklich billig ( keine Zuweisungen , bitte) und schnell ( keine schweren Berechnungen und sicherlich keine Datenbankverbindungen) ist und eine gute Verteilung bietet.

Das schwere Heben sollte Teil der Equals () Methode sein; Der Hash sollte eine sehr billige Operation sein, um es zu ermöglichen, Equals () auf so wenig Gegenstände wie möglich aufzurufen.

Und noch ein letzter Tipp: Verlassen Sie sich nicht darauf, dass GetHashCode () über mehrere Anwendungsläufe hinweg stabil ist . Viele .Net-Typen garantieren nicht, dass ihre Hash-Codes nach einem Neustart gleich bleiben. Daher sollten Sie nur den Wert von GetHashCode () für Datenstrukturen im Speicher verwenden.




ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members .

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}



Related