TheDeveloperBlog.com


C# Custom IEqualityComparer: Ignore Chars

IEqualityComparer, 2. A Dictionary by default uses exact lookups. It computes a hash code based on a key and if another value matches it exactly, the two keys match. But this can be changed. We implement a custom, fuzzy lookup that ignores characters.


Example. This is the custom IEqualityComparer. It is not perfect, but I use it in an actual program. It computes hashes (in GetHashCode) and compares keys (in Equals) ignoring all hyphen characters.

So: The string "cat-1" is considered equal to the string "cat1." The hyphen (dash) character is simply ignored.

Based on:

.NET 4.5

Example custom IEqualityComparer: C#

using System.Collections.Generic;

class CustomComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
	int xPos = 0;
	int yPos = 0;
	while (true)
	{
	    // ... Fail if past end.
	    if (xPos >= x.Length)
	    {
		return false;
	    }
	    if (yPos >= y.Length)
	    {
		return false;
	    }
	    // ... Skip past hyphens.
	    if (x[xPos] == '-')
	    {
		xPos++;
		continue;
	    }
	    if (y[yPos] == '-')
	    {
		yPos++;
		continue;
	    }
	    // ... Fail if different.
	    if (x[xPos] != y[yPos])
	    {
		return false;
	    }
	    // ... If we have traversed both strings with no error, we match.
	    if (xPos == x.Length - 1 &&
		yPos == y.Length - 1)
	    {
		return true;
	    }
	    // ... Increment both places.
	    xPos++;
	    yPos++;
	}
    }

    public int GetHashCode(string obj)
    {
	int code = 0;
	// ... Add together all chars.
	for (int i = 0; i < obj.Length; i++)
	{
	    if (obj[i] != '-')
	    {
		code += obj[i];
	    }
	}
	return code;
    }
}

The code has some problems. Its implementation of GetHashCode is slow and would become slower on many keys. The Equals method may have trouble with trailing hyphens. But the code shows the concept.

And: It avoids string allocations. I have found this is often a key to good performance.


Example 2. Next, we use the CustomComparer we created. This Main method shows how hyphens are ignored in the keys of a Dictionary that uses this comparer. We pass a new instance of CustomComparer to the Dictionary constructor.

MainVar

Here: All three lookups return the correct value. Some unusual cases, not shown, may not work correctly (such as trailing hyphens).

C# program that uses Dictionary, CustomComparer

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
	// ... Add data to dictionary.
	var dictionary = new Dictionary<string, int>(new CustomComparer());
	dictionary["cat-1"] = 1;
	dictionary["cat-2"] = 2;
	dictionary["dog-bark"] = 10;
	dictionary["dog-woof"] = 20;

	// ... Lookup values, ignoring hyphens.
	Console.WriteLine(dictionary["cat1"]);
	Console.WriteLine(dictionary["cat-1"]);
	Console.WriteLine(dictionary["dog--bark"]);
    }
}

Output

1
1
10


Discussion. This code has many possible uses. It can potentially eliminate many Substring, Replace or ToLower calls. You can simply use a hashing function GetHashCode to perform all these operations, with no new strings.

SubstringReplaceToLower

Tip: You could even implement a simple language using this hashing mechanism. It could process strings without modifying them.

In my experiments, avoiding string allocations (caused by Substring or similar methods) is usually a performance win. And this style of IEqualityComparer tends to benefit performance.

Benchmark

GetHashCode. The GetHashCode method can be optimized. Try multiplying its result by an important character in the string. Also, there is an easy way to test how selective your hash numbers are.

GetHashCode

Test: Increment a static int field each time Equals is called. With a better hash code, Equals is called fewer times.

Tip: With a more selective hash, the internal Dictionary methods need to search fewer values with Equals, so it is called less.


Summary. With a custom IEqualityComparer, we look up strings based on the results of a method. So we can apply fuzzy matching, or any other transformation before actually comparing keys. We need not modify those keys before adding or testing them.

IEqualityComparer