Question regarding GetHashCode() and Equality

Tags: C#, Effective C#

I received a great question from an Effective C# reader:

 

In Item 7 in Effective C# (2nd edition), you state:
"If two objects are equal (as deļ¬ned by operator==), they must generate the same hash value."
Is this because operator== is used to check for exact equality after an item has been located in a certain hash table bucket? If so, I am confused because all of the blog posts/articles I've read which address this topic claim that exact equality is checked with Object.Equals().

Your statements in the question get very close to the answer.  Object.Equals()  and operator==() are both used to see if two items are equal.  I do often wonder about the wisdom of having so many methods to check if two objects are equal, but we are where we are now.

When you want to find an object in a collection that uses the hash code, the algorithm first generates the hash code for the item you seek. That determines which bucket to search for the item. Once the bucket is determined, the algorithm uses the Equals() to find an item that matches.  (If your type overrides Object.Equals(), it will use the overridden method).

If two items that are equal (as defined by Equals()) generate different hash codes, the hash-based container will be searching in the wrong bucket. You’ll never find that item in the container.

The hash code and the buckets are used to improve the performance of the search. Buckets may contain more than one item, but all items in a bucket will have generated a hash code that resolves to the same bucket. The hash based containers allocate buckets based on the  number of items in the collection. More items means more buckets. Fewer items means fewer buckets. (See http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs for the reference implementation).

So, if you are extremely careful, you can get everything to work if GetHashCode() and Equals() obey the same relationships. However, in practice, C# developers sometimes use “item.Equals(otherItem)”, and sometimes developers use “item == otherItem”. It’s best to ensure that the semantics of operator ==() matches either Object.ReferenceEquals() or Object.Equals(), or both.  Object equality is an important concept. You, as the designer of a class, should make sure that your type obeys the correct semantics for all the methods that check the equality of two different objects.

I hope that helps.

4 Comments

  • Simon Belanger said

    "Buckets may contain more than one item, but all items in a bucket will have generated the same hash code."

    If that were true, there would be as many buckets as there are possible hash code values. Do you mean "all items with the same hash code are guaranteed to be in the same bucket" ?

  • Bernie Schoch said

    To clarify, just because two objects have the same hash value doesn't mean they are equal, but the reverse must be true in that two equal objects must have the same hash value.

  • Bill Wagner said

    @Bernie

    Yes, that's correct. I do explain that point in detail in Item 7 of Effective C# (2nd edition).

Add a Comment