C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
Each bit can be set separately. Because of this, it can be used as a simple array of boolean values. In certain string processing tasks, this mask optimization yields good results.
Note: With this optimization, we can avoid handling duplicate letters by keeping track of ones already encountered.
Example. The goal of this program is to detect duplicate letters in a string of lowercase chars. To do this, it uses a mask integer. The lowercase letters are converted to integers 0 to 25 by subtracting 97 from their ASCII representation.
Then: As each letter is encountered, its bit is tested. If it was already encountered, its bit will be set to 1.
Otherwise: We record the letter in the mask. The next time it is encountered we will recognize it.
C# program that checks duplicate letters
using System;
class Program
{
static void Main()
{
Console.WriteLine("deves");
ReportDuplicateCharacters("deves");
Console.WriteLine("massive");
ReportDuplicateCharacters("massive");
Console.WriteLine("mississippi");
ReportDuplicateCharacters("mississippi");
}
static void ReportDuplicateCharacters(string value)
{
int mask = 0;
for (int i = 0; i < value.Length; i++)
{
int index = value[i] - 97;
if ((mask & (1 << index)) != 0)
{
Console.WriteLine("Duplicate: {0}", value[i]);
}
else
{
mask |= (1 << index);
}
}
// To zero a bit: mask &= ~(1 << index);
}
}
Output
deves
massive
Duplicate: s
mississippi
Duplicate: s
Duplicate: i
Duplicate: s
Duplicate: s
Duplicate: i
Duplicate: p
Duplicate: i



Discussion. My algorithm only requires one pass through the string. In testing I found that an array of bools is about as fast. It is possible to re-use the memory of an array of bools and get similar performance to this bitmask approach.
But: The mask approach here is more memory-efficient should you need to store this data somehow.
Summary. This style of bit-testing algorithm has a limited range of use in programming. Sometimes it is effective. Usually it is not worthwhile. I have rewritten this particular algorithm repeatedly—which means it's probably worth documenting.