TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

C# Boyer-Moore String Search Example

This C# algorithm article implements a version of the Boyer-Moore search. It provides benchmark data.

Boyer-Moore is a string searching algorithm.

Compared to IndexOf, this is a faster way to search for substrings in a string. It avoids checking most positions in the source string. The implementation here uses constant characters.

Example. First, this program demonstrates how to use the Boyer-Moore style algorithm. It uses a lookup table to determine the next possible location of a string to find. The implementation here is hard-coded.

Info: This enhances the method's performance, but detracts from its usefulness in programs.

The program has methods that search for the characters "ABCD" uppercase in that order in the source string and returns true or false. Contains1 is a method that implements parts of the Boyer-Moore string searching algorithm.

C# program that uses simple Boyer-Moore search

using System;

class Program
{
    static void Main()
    {
	const string input1 = "There were ABCD Perls";
	const string input2 = "ABCD string";
	const string input3 = "No ABC dee string";
	const string input4 = "Test BAC";

	Console.WriteLine(Contains1(input1));
	Console.WriteLine(Contains1(input2));
	Console.WriteLine(Contains1(input3));
	Console.WriteLine(Contains1(input4));

	Console.WriteLine(Contains2(input1));
	Console.WriteLine(Contains2(input2));
	Console.WriteLine(Contains2(input3));
	Console.WriteLine(Contains2(input4));
    }

    static bool Contains1(string value)
    {
	// Searches for 4-letter constant string using Boyer-Moore style algorithm.
	// ... Uses switch as lookup table.
	int i = 3; // First index to check.
	int length = value.Length;
	while (i < length)
	{
	    switch (value[i])
	    {
		case 'D':
		    // Last character in pattern found.
		    // ... Check for definite match.
		    if (value[i - 1] == 'C' &&
			value[i - 2] == 'B' &&
			value[i - 3] == 'A')
		    {
			return true;
		    }
		    // Must be at least 4 characters away.
		    i += 4;
		    continue;
		case 'C':
		    // Must be at least 1 character away.
		    i += 1;
		    continue;
		case 'B':
		    // Must be at least 2 characters away.
		    i += 2;
		    continue;
		case 'A':
		    // Must be at least 3 characters away.
		    i += 3;
		    continue;
		default:
		    // Must be at least 4 characters away.
		    i += 4;
		    continue;
	    }
	}
	// Nothing found.
	return false;
    }

    static bool Contains2(string value)
    {
	// Searches for 4-letter constant string with IndexOf.
	return value.IndexOf("ABCD", StringComparison.Ordinal) != -1;
    }
}

Output

True
True
False
False
True
True
False
False

In this example, the input string is passed to the method, and the pattern string, which here contains four constant uppercase letters, is searched. The final character in the pattern (D) is searched for at each place.

Switch

And: If D is found, we use an if-statement to check the previous three characters for C, B and A.

If

If the character is any of ABCD, we advance the maximum number of places where the string might start (one to four). If the character is not one of ABCD, then we know that ABCD cannot be found unless we advance four places.

Tip: This special logic often reduces the number of character checks by 75% in this program.

Discussion. Unfortunately, this constant character checking code is only useful in limited situations. If you have a program that must scan for a certain important word or name in many strings, then you could use this code to do that.

So: For example, you could use this code to check for values in HTTP headers, such as to detect browser user agents.

However, the inflexibility of this code can be seen as an advantage. In the .NET Framework, loading a constant onto the evaluation stack is much faster than loading a variable reference or field.

Therefore: A more advanced method that is configurable would likely be slower because it would have to access memory more often.

Benchmark. Let's look at the benchmark results for the Boyer-Moore style string searching algorithm versus the Contains2 method, which uses the IndexOf method. The Boyer-Moore style algorithm is about 86% faster than the IndexOf method.

Note: The Boyer-Moore algorithm is often faster than other direct character testing expressions in the C# language.

Method calls tested: separate loops

Contains1("This is an A example test of ABCD and more");
Contains2("This is an A example test of ABCD and more");

Benchmark results

Contains1:  23.60 ns
Contains2: 166.98 ns

Summary. We implemented and demonstrated a Boyer-Moore string searching method that uses constant characters. We noted how the principles of the Boyer-Moore algorithm can be used to improve the performance of string searching for substrings.

Therefore: The method here could be adapted for fast searching of other constant substrings, but is limited in its uses.


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf