TheDeveloperBlog.com

Home | Contact Us

CSharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript

C# Mask Optimization

This C# algorithm uses a bit mask to optimize checking for duplicate lowercase letters.

Mask optimization. An int has 32 bits.

Each bit can be set separately. Because of this, it can be used as a simple array of boolean values. In certain string processing tasks, this mask optimization yields good results.

Note: With this optimization, we can avoid handling duplicate letters by keeping track of ones already encountered.

Example. The goal of this program is to detect duplicate letters in a string of lowercase chars. To do this, it uses a mask integer. The lowercase letters are converted to integers 0 to 25 by subtracting 97 from their ASCII representation.

Char

Then: As each letter is encountered, its bit is tested. If it was already encountered, its bit will be set to 1.

Otherwise: We record the letter in the mask. The next time it is encountered we will recognize it.

C# program that checks duplicate letters

using System;

class Program
{
    static void Main()
    {
	Console.WriteLine("deves");
	ReportDuplicateCharacters("deves");

	Console.WriteLine("massive");
	ReportDuplicateCharacters("massive");

	Console.WriteLine("mississippi");
	ReportDuplicateCharacters("mississippi");
    }

    static void ReportDuplicateCharacters(string value)
    {
	int mask = 0;
	for (int i = 0; i < value.Length; i++)
	{
	    int index = value[i] - 97;
	    if ((mask & (1 << index)) != 0)
	    {
		Console.WriteLine("Duplicate: {0}", value[i]);
	    }
	    else
	    {
		mask |= (1 << index);
	    }
	}
	// To zero a bit: mask &= ~(1 << index);
    }
}

Output

deves
massive
Duplicate: s
mississippi
Duplicate: s
Duplicate: i
Duplicate: s
Duplicate: s
Duplicate: i
Duplicate: p
Duplicate: i

Discussion. My algorithm only requires one pass through the string. In testing I found that an array of bools is about as fast. It is possible to re-use the memory of an array of bools and get similar performance to this bitmask approach.

But: The mask approach here is more memory-efficient should you need to store this data somehow.

Summary. This style of bit-testing algorithm has a limited range of use in programming. Sometimes it is effective. Usually it is not worthwhile. I have rewritten this particular algorithm repeatedly—which means it's probably worth documenting.