C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
We can use Dictionary and hash lookups to compute anagram lists quickly. This is useful for learning or making word games. Here we implement an anagram algorithm in the C# language.
Intro. First, when you have two words that are anagrams, their alphabetized forms will be equal. This is because their letter frequencies are equal for each letter. This relationship forms the base of our solution.
Input string/alphabetized form cat act tca act senators aeonrsst treasons aeonrsst
Example. Here we see a console C# program that first reads in a local word list, which you will need to download separately, and then sorts each word. It then stores all the original words separately at that alphabetized key.
StreamReaderTryGetValueToCharArrayArray.Sort
C# program that finds anagrams in file using System; using System.Collections.Generic; using System.IO; class Anagrams { static void Main() { // 1 // Read and sort dictionary var d = Read(); // 2 // Read in user input and show anagrams string line; while ((line = Console.ReadLine()) != null) { Show(d, line); } } static Dictionary<string, string> Read() { var d = new Dictionary<string, string>(); // Read each line using (StreamReader r = new StreamReader("enable1.txt")) { string line; while ((line = r.ReadLine()) != null) { // Alphabetize the line for the key // Then add to the value string string a = Alphabetize(line); string v; if (d.TryGetValue(a, out v)) { d[a] = v + "," + line; } else { d.Add(a, line); } } } return d; } static string Alphabetize(string s) { // Convert to char array, then sort and return char[] a = s.ToCharArray(); Array.Sort(a); return new string(a); } static void Show(Dictionary<string, string> d, string w) { // Write value for alphabetized word string v; if (d.TryGetValue(Alphabetize(w), out v)) { Console.WriteLine(v); } else { Console.WriteLine("-"); } } } Output martian,tamarin opts,post,pots,spot,stop,tops assentor,senators,starnose,treasons
First, the program reads in the word list that must be stored locally on the disk. The word list used here has the name "enable1.txt", which is a version of the ENABLE word list available online.
Next: For each line, it alphabetizes the letters. This forms the alphabetized key for that word.
Then, it looks up that key in the Dictionary. If the alphabetized key is found, the current word we are processing will be appended to its value. In this way, we hash all real words at keys of their alphabetized forms.
Finally: It accepts user input and prints each list that corresponds to the input after it is alphabetized.
Discussion. For the Dictionary, we can use a simple string value type and append new strings to it. This means the data is stored as a single string. Alternatively, we could store the values as List(string), but this degrades performance.
To test, I altered the code so that it uses Dictionary<string, List<string>> instead of Dictionary<string, string>. This benchmark shows that it is faster to use the simpler string value than a List(string) type.
Anagram method benchmark Dictionary<string, string> : 4556 ms Dictionary<string, List<string>> : 5647 ms
Disk: For better performance, you can persist the alphabetized key-value pairs to the disk. Using a binary file would also help.
In the game Scrabble, blanks substitute for any letter. This can be handled in different ways in an anagram program: with a tree or approximate string matching. You can use Levenshtein distance to find similar strings.
Summary. We saw how to alphabetize words with Array.Sort and the new string constructor, and then hash those keys in a Dictionary. We implemented an anagram algorithm in the C# programming language.
Finally: We compared performance of two alternatives, and selected the faster one.