C# All Substring Method

This C# program uses an algorithm to generate all possible substrings in a string.

All Substrings. Is one string contained in another?

This can be accomplished with IndexOf. But we can also extract all possible substrings, put them in a Dictionary, and do hash lookups. This can be much faster.

Intro. I developed an algorithm that needed to tell if any one string is contained in any other string. The program had about 6000 to 100000 strings. This can be done by looping over every string, and then comparing it to every other string.

However: This decimates performance. The algorithm, written this way, was slow. I decided to optimize it.

So: I developed an algorithm that extracts all possible substrings from each string. It puts all of these in a Dictionary.

And: It then can replace a loop over thousands of strings with a single hash lookup. This made performance about 30 times better.

Example. To get started, I needed a method that generates all possible substrings. This example code loops through all possible lengths of a substring. It then loops over all possible starting indexes.

Then: It writes each substring to the console with WriteLine. In the actual algorithm, these substrings are added to a Dictionary.

Key and value: In the Dictionary, the key is the substring, and the value is the original string.

C# program that generates all substrings

using System;

class Program
{
    static void Main()
    {
	string value = "abcdefghi";
	// Avoid full length.
	for (int length = 1; length < value.Length; length++)
	{
	    // End index is tricky.
	    for (int start = 0; start <= value.Length - length; start++)
	    {
		string substring = value.Substring(start, length);
		Console.WriteLine(substring);
	    }
	}
    }
}

Output

a
b
c
d
e
f
g
h
i
ab
bc
cd
de
ef
fg
gh
hi
abc
bcd
cde
def
efg
fgh
ghi
abcd
bcde
cdef
defg
efgh
fghi
abcde
bcdef
cdefg
defgh
efghi
abcdef
bcdefg
cdefgh
defghi
abcdefg
bcdefgh
cdefghi
abcdefgh
bcdefghi

Discussion. When writing this algorithm, I at first tried to use a nested loop. This was simpler to develop. Its initialization time was also lower. It needed no fancy collections like a Dictionary.

However, the simpler algorithm was worse. It was O(N squared) because it had a nested loop. With changes, I reduced this. I avoided shorter strings. Still, performance was poor. And it would become worse as data was added.

So: I had to rethink the algorithm to solve the performance disaster. As usual, the Dictionary saved the day.

Tip: An algorithm that uses nested loops is sometimes optimized by one that creates a data structure in a Dictionary.

Summary. Micro-optimizations often lead to small performance benefits. By redesigning an algorithm, the advances are sometimes more severe. And with an effective algorithm, micro-optimizations fade in importance or become unnecessary.

.Net

.NET Array Dictionary List String 2D Async DataTable Dates DateTime Enum File For Foreach Format IEnumerable If IndexOf Lambda LINQ Parse Path Process Property Regex Replace Sort Split Static StringBuilder Substring Switch Tuple

Java

Core Array ArrayList HashMap String 2D Cast Character Console Deque Duplicates File For Format HashSet If IndexOf Lambda Math ParseInt Process Random Regex Replace Sort Split StringBuilder Substring Switch Vector While

TheDeveloperBlog.com

C# All Substring Method

All Substrings. Is one string contained in another?

.Net

Java

Related Links