C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
This can be accomplished with IndexOf. But we can also extract all possible substrings, put them in a Dictionary, and do hash lookups. This can be much faster.
Intro. I developed an algorithm that needed to tell if any one string is contained in any other string. The program had about 6000 to 100000 strings. This can be done by looping over every string, and then comparing it to every other string.
However: This decimates performance. The algorithm, written this way, was slow. I decided to optimize it.
So: I developed an algorithm that extracts all possible substrings from each string. It puts all of these in a Dictionary.
And: It then can replace a loop over thousands of strings with a single hash lookup. This made performance about 30 times better.
Example. To get started, I needed a method that generates all possible substrings. This example code loops through all possible lengths of a substring. It then loops over all possible starting indexes.
Then: It writes each substring to the console with WriteLine. In the actual algorithm, these substrings are added to a Dictionary.
Key and value: In the Dictionary, the key is the substring, and the value is the original string.
C# program that generates all substrings using System; class Program { static void Main() { string value = "abcdefghi"; // Avoid full length. for (int length = 1; length < value.Length; length++) { // End index is tricky. for (int start = 0; start <= value.Length - length; start++) { string substring = value.Substring(start, length); Console.WriteLine(substring); } } } } Output a b c d e f g h i ab bc cd de ef fg gh hi abc bcd cde def efg fgh ghi abcd bcde cdef defg efgh fghi abcde bcdef cdefg defgh efghi abcdef bcdefg cdefgh defghi abcdefg bcdefgh cdefghi abcdefgh bcdefghi
Discussion. When writing this algorithm, I at first tried to use a nested loop. This was simpler to develop. Its initialization time was also lower. It needed no fancy collections like a Dictionary.
However, the simpler algorithm was worse. It was O(N squared) because it had a nested loop. With changes, I reduced this. I avoided shorter strings. Still, performance was poor. And it would become worse as data was added.
So: I had to rethink the algorithm to solve the performance disaster. As usual, the Dictionary saved the day.
Tip: An algorithm that uses nested loops is sometimes optimized by one that creates a data structure in a Dictionary.
Summary. Micro-optimizations often lead to small performance benefits. By redesigning an algorithm, the advances are sometimes more severe. And with an effective algorithm, micro-optimizations fade in importance or become unnecessary.