C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Class: This class implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode.
IEqualityComparer Interface: Microsoft DocsGetHashCode: This method is effective on the string keys in the program. You will need to write your own GetHashCode method.
Details: GetHashCode computes its result from 2 chars. It applies a multiplier to the first char, and adds the length of the key.
MultiplyTip: You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.
Class that implements IEqualityComparer: C#
public class StringIndexKeyComparer : IEqualityComparer<string>
{
/// <summary>
/// Has a good distribution.
/// </summary>
const int _multiplier = 89;
/// <summary>
/// Whether the two strings are equal
/// </summary>
public bool Equals(string x, string y)
{
return x == y;
}
/// <summary>
/// Return the hash code for this string.
/// </summary>
public int GetHashCode(string obj)
{
// Stores the result.
int result = 0;
// Don't compute hash code on null object.
if (obj == null)
{
return 0;
}
// Get length.
int length = obj.Length;
// Return default code for zero-length strings [valid, nothing to hash with].
if (length > 0)
{
// Compute hash for strings with length greater than 1
char let1 = obj[0]; // First char of string we use
char let2 = obj[length - 1]; // Final char
// Compute hash code from two characters
int part1 = let1 + length;
result = (_multiplier * part1) + let2 + length;
}
return result;
}
}
Note: For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.
And: I decided that the best distribution of hash codes would use the second character, the length, and another character.
String keys:
22D-Array-IEnumerable
22D-Array-Use
27-Zip
27-Zip-DEFLATE-Ratio
27-Zip-Examples
2About
2Action-Dictionary
2Adobe-CS3-Rounded
2Adobe-Fireworks-CS3-Resampling
Version 1: The code uses the same custom IEqualityComparer shown earlier in this document. The ContainsKey method is called many times.
ConstructorVersion 2: We use a Dictionary without a custom IEqualityComparer. So each string uses the default GetHashCode method.
Result: In 2020, using the custom IEqualityComparer is faster. We avoid some expensive hashing—but this code depends on the key format.
C# program that uses custom IEqualityComparer
using System;
using System.Diagnostics;
using System.Collections.Generic;
public class StringIndexKeyComparer : IEqualityComparer<string>
{
const int _multiplier = 89;
public bool Equals(string x, string y)
{
return x == y;
}
public int GetHashCode(string obj)
{
int result = 0;
if (obj == null)
{
return 0;
}
int length = obj.Length;
if (length > 0)
{
char let1 = obj[0];
char let2 = obj[length - 1];
int part1 = let1 + length;
result = (_multiplier * part1) + let2 + length;
}
return result;
}
}
class Program
{
const int _max = 100000000;
static void Main()
{
var test = new Dictionary<string, bool>(new StringIndexKeyComparer());
var test2 = new Dictionary<string, bool>();
// Elements.
string[] elements = { "22D-Array-IEnumerable",
"22D-Array-Use", "27-Zip", "27-Zip-DEFLATE-Ratio",
"27-Zip-Examples", "2About", "2Action-Dictionary",
"2Adobe-CS3-Rounded", "2Adobe-Fireworks-CS3-Resampling" };
// Add elements.
foreach (string element in elements)
{
test.Add(element, true);
test2.Add(element, true);
}
var s1 = Stopwatch.StartNew();
// Version 1: test custom IEqualityComparer.
for (int i = 0; i < _max; i++)
{
if (!test.ContainsKey("2About"))
{
return;
}
}
s1.Stop();
var s2 = Stopwatch.StartNew();
// Version 2: use default dictionary.
for (int i = 0; i < _max; i++)
{
if (!test2.ContainsKey("2About"))
{
return;
}
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) /
_max).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) /
_max).ToString("0.00 ns"));
}
}
Output
15.54 ns StringIndexKeyComparer
23.34 ns Default
Next: We implement a custom, fuzzy lookup that ignores characters. This is the custom IEqualityComparer.
Tip: It computes hashes (in GetHashCode) and compares keys (in Equals) ignoring all hyphen characters.
So: The string "cat-1" is considered equal to the string "cat1." The hyphen (dash) character is simply ignored.
Example custom IEqualityComparer: C#
using System.Collections.Generic;
class CustomComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
int xPos = 0;
int yPos = 0;
while (true)
{
// ... Fail if past end.
if (xPos >= x.Length)
{
return false;
}
if (yPos >= y.Length)
{
return false;
}
// ... Skip past hyphens.
if (x[xPos] == '-')
{
xPos++;
continue;
}
if (y[yPos] == '-')
{
yPos++;
continue;
}
// ... Fail if different.
if (x[xPos] != y[yPos])
{
return false;
}
// ... If we have traversed both strings with no error, we match.
if (xPos == x.Length - 1 &&
yPos == y.Length - 1)
{
return true;
}
// ... Increment both places.
xPos++;
yPos++;
}
}
public int GetHashCode(string obj)
{
int code = 0;
// ... Add together all chars.
for (int i = 0; i < obj.Length; i++)
{
if (obj[i] != '-')
{
code += obj[i];
}
}
return code;
}
}
Here: All three lookups return the correct value. Some unusual cases, not shown, may not work correctly (such as trailing hyphens).
C# program that uses Dictionary, CustomComparer
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
// ... Add data to dictionary.
var dictionary = new Dictionary<string, int>(new CustomComparer());
dictionary["cat-1"] = 1;
dictionary["cat-2"] = 2;
dictionary["dog-bark"] = 10;
dictionary["dog-woof"] = 20;
// ... Lookup values, ignoring hyphens.
Console.WriteLine(dictionary["cat1"]);
Console.WriteLine(dictionary["cat-1"]);
Console.WriteLine(dictionary["dog--bark"]);
}
}
Output
1
1
10
Tip: To better distribute the hash code computations in your program, using a multiplier is useful.
Tip 2: A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.
And: This reduces the number of hash collisions and improves lookup performance.
Tip: Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.
But: It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.
Test: Increment a static int field each time Equals is called. With a better hash code, Equals is called fewer times.
Tip: With a more selective hash, the internal Dictionary methods need to search fewer values with Equals, so it is called less.