C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
We implement this interface in the C# language. We also measure performance. We compare GetHashCode methods on custom string keys. We use it with the Dictionary type.
Note: You can implement the IEqualityComparer interface for the Dictionary collection.
Note 2: The built-in C# hash code on string types is not ideal. Most GetHashCode implementations will be specific to your data.
Benchmark: tested on 499 string keys Default string IEqualityComparer: 4043 ms Custom IEqualityComparer: 2736 ms
Example. First we see the custom IEqualityComparer I defined. It implements the IEqualityComparer interface, and its name is StringIndexKeyComparer. Your comparer can have any name, but it must use the interface inheritance syntax as shown.
Class that implements IEqualityComparer: C# public class StringIndexKeyComparer : IEqualityComparer<string> { /// <summary> /// Has a good distribution. /// </summary> const int _multiplier = 89; /// <summary> /// Whether the two strings are equal /// </summary> public bool Equals(string x, string y) { return x == y; } /// <summary> /// Return the hash code for this string. /// </summary> public int GetHashCode(string obj) { // Stores the result. int result = 0; // Don't compute hash code on null object. if (obj == null) { return 0; } // Get length. int length = obj.Length; // Return default code for zero-length strings [valid, nothing to hash with]. if (length > 0) { // Compute hash for strings with length greater than 1 char let1 = obj[0]; // First char of string we use char let2 = obj[length - 1]; // Final char // Compute hash code from two characters int part1 = let1 + length; result = (_multiplier * part1) + let2 + length; } return result; } }
This class implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode. The class uses the default Equals method, but uses a new GetHashCode implementation.
IEqualityComparer Interface: MSDN
Tip: The GetHashCode method above is optimal on the string keys I had. You will need to write your own GetHashCode method.
Tip 2: You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.
The GetHashCode shown here computes its result from the key[1] character, and from the final character in key. It applies a multiplier to the first character, and also adds the length of the key.
Keys. You can declare a generic Dictionary with any type keys and values. But the hashing method used for those types is always the default, unless you pass in an IEqualityComparer instance in the Dictionary constructor.
Note: For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.
String keys 22D-Array-IEnumerable 22D-Array-Use 27-Zip 27-Zip-DEFLATE-Ratio 27-Zip-Examples 2About 2Action-Dictionary 2Adobe-CS3-Rounded 2Adobe-Fireworks-CS3-Resampling
The string keys I have all have the same first character, which is a digit. Because this first character is monotonous, it isn't useful for hashing because its distribution is poor.
In the above strings, I decided that the best distribution of hash codes would use the second character, the length, and another character. To reduce collisions, one character would need to be multiplied by a constant.
Example 2. To use the custom IEqualityComparer with the Dictionary, you must pass an instance of the IEqualityComparer to the Dictionary constructor. It is a good idea to use a static instance of your IEqualityComparer. This is not shown.
Next: The code uses the same custom IEqualityComparer shown earlier in this document. It is passed to the Dictionary in the constructor.
C# program that uses custom class using System; using System.Collections.Generic; /// <summary> /// [see above] /// </summary> public class StringIndexKeyComparer : IEqualityComparer<string> { // [see above] } class Program { static void Main() { var custom = new Dictionary<string, bool>(new StringIndexKeyComparer()); custom.Add("22D-Array-IEnumerable", true); custom.Add("22D-Array-Use", true); custom.Add("27-Zip", true); custom.Add("27-Zip-DEFLATE-Ratio", true); custom.Add("27-Zip-Examples", true); custom.Add("2About", true); custom.Add("2Action-Dictionary", true); custom.Add("2Adobe-CS3-Rounded", true); custom.Add("2Adobe-Fireworks-CS3-Resampling", true); } }
Discussion. Hashing is a complicated topic but an excellent hash code can really improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.
To better distribute the hash code computations in your program, using a multiplier is useful. A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.
And: This reduces the number of hash collisions and improves lookup performance.
The constant 89 is one I found to provide a good hashing distribution in the data set I am using. There is a custom constructor I defined on the StringIndexKeyComparer. It is not important to the code style here.
Tip: Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.
But: It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.
Summary. We saw some example code, which won't be useful for you, but the explanations may help you understand how to use IEqualityComparer in your project. The most important part of IEqualityComparer here is its implementation of GetHashCode.