C# Anagram Method

This C# algorithm article implements an anagram solving method. It computes anagrams of any string.

Anagrams can be rearranged to form different words.

We can use Dictionary and hash lookups to compute anagram lists quickly. This is useful for learning or making word games. Here we implement an anagram algorithm in the C# language.

 

 

Intro. First, when you have two words that are anagrams, their alphabetized forms will be equal. This is because their letter frequencies are equal for each letter. This relationship forms the base of our solution.

 

Input string/alphabetized form

cat       act
tca       act
senators  aeonrsst
treasons  aeonrsst

 

Example. Here we see a console C# program that first reads in a local word list, which you will need to download separately, and then sorts each word. It then stores all the original words separately at that alphabetized key.

StreamReaderTryGetValueToCharArrayArray.Sort

 

C# program that finds anagrams in file

using System;
using System.Collections.Generic;
using System.IO;

class Anagrams
{
    static void Main()
    {
	// 1
	// Read and sort dictionary
	var d = Read();

	// 2
	// Read in user input and show anagrams
	string line;
	while ((line = Console.ReadLine()) != null)
	{
	    Show(d, line);
	}
    }

    static Dictionary<string, string> Read()
    {
	var d = new Dictionary<string, string>();
	// Read each line
	using (StreamReader r = new StreamReader("enable1.txt"))
	{
	    string line;
	    while ((line = r.ReadLine()) != null)
	    {
		// Alphabetize the line for the key
		// Then add to the value string
		string a = Alphabetize(line);
		string v;
		if (d.TryGetValue(a, out v))
		{
		    d[a] = v + "," + line;
		}
		else
		{
		    d.Add(a, line);
		}
	    }
	}
	return d;
    }

    static string Alphabetize(string s)
    {
	// Convert to char array, then sort and return
	char[] a = s.ToCharArray();
	Array.Sort(a);
	return new string(a);
    }

    static void Show(Dictionary<string, string> d, string w)
    {
	// Write value for alphabetized word
	string v;
	if (d.TryGetValue(Alphabetize(w), out v))
	{
	    Console.WriteLine(v);
	}
	else
	{
	    Console.WriteLine("-");
	}
    }
}

Output

martian,tamarin
opts,post,pots,spot,stop,tops
assentor,senators,starnose,treasons

First, the program reads in the word list that must be stored locally on the disk. The word list used here has the name "enable1.txt", which is a version of the ENABLE word list available online.

Next: For each line, it alphabetizes the letters. This forms the alphabetized key for that word.

Alphabetize String

Then, it looks up that key in the Dictionary. If the alphabetized key is found, the current word we are processing will be appended to its value. In this way, we hash all real words at keys of their alphabetized forms.

Finally: It accepts user input and prints each list that corresponds to the input after it is alphabetized.

 

Discussion. For the Dictionary, we can use a simple string value type and append new strings to it. This means the data is stored as a single string. Alternatively, we could store the values as List(string), but this degrades performance.

List

 

To test, I altered the code so that it uses Dictionary<string, List<string>> instead of Dictionary<string, string>. This benchmark shows that it is faster to use the simpler string value than a List(string) type.

Anagram method benchmark

Dictionary<string, string>       : 4556 ms
Dictionary<string, List<string>> : 5647 ms

Disk: For better performance, you can persist the alphabetized key-value pairs to the disk. Using a binary file would also help.

Convert Dictionary, String

In the game Scrabble, blanks substitute for any letter. This can be handled in different ways in an anagram program: with a tree or approximate string matching. You can use Levenshtein distance to find similar strings.

TreeLevenshtein

 

Summary. We saw how to alphabetize words with Array.Sort and the new string constructor, and then hash those keys in a Dictionary. We implemented an anagram algorithm in the C# programming language.

 

Finally: We compared performance of two alternatives, and selected the faster one.