C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
In the C# language a tree can be implemented with classes (nodes) that point to further nodes. In this simple implementation, we use a Dictionary.
Note: This tree stores strings. It uses a node for each letter. Prefixes of strings are shared.
Note 2: This tree has fairly poor performance. In other words, it is slow. The general design could be optimized to be much faster.
Example. To start, this class serves as the root of the tree. It provides helper methods—AddWord and ContainsNode. We use this class to add strings to the tree. ContainsNode is one way to search for strings.
DictionaryNode: The root field is of type DictionaryNode. That type is shown in the example after this one.
Tree class, DictionaryNodeRoot.cs: C# /// <summary> /// Dictionary node root. /// </summary> class DictionaryNodeRoot { /// <summary> /// Root. /// </summary> DictionaryNode _root = new DictionaryNode(); /// <summary> /// Add word. /// </summary> public void AddWord(string value) { DictionaryNode current = this.GetNode(); // Loop. for (int i = 0; i < value.Length; i++) { current = current.Add(value[i]); } // Set state. current.SetWord(value); } /// <summary> /// Contains word. /// </summary> public bool ContainsWord(string value) { DictionaryNode current = this.GetNode(); // Loop. for (int i = 0; i < value.Length; i++) { current = current.Get(value[i]); if (current == null) { return false; } } // Return state. return current != null && current.GetWord() != null; } /// <summary> /// Get root node. /// </summary> public DictionaryNode GetNode() { return this._root; } }
In this example, we add a string char-by-char with a for-loop in AddWord. The Add method from DictionaryNode.cs returns the newly-added or accessed node. And then, on the next char, we call Add again.
Info: This logic ends up building the tree. Nodes that already exist are not added again. They are instead shared.
Finally: The SetWord method is used. This signals to DictionaryType that a complete word has been reached at this node.
ContainsWord: This method meanwhile is similar to AddWord, but does not add anything. It just does lookups. It returns a bool.
Example 2. Next we see the actual node type for the tree. This is DictionaryNode.cs. On each node, we use a Dictionary instance. The Dictionary provides references to child nodes. And the string field tells us whether a complete word here exists.
Key: The Dictionary field uses a char key type. This is not essential. With this tree design, any value type could be used.
Node class, DictionaryNode.cs: C# using System.Collections.Generic; /// <summary> /// Dictionary node. /// </summary> class DictionaryNode { /// <summary> /// Word. /// </summary> string _word; /// <summary> /// Dict. /// </summary> Dictionary<char, DictionaryNode> _dict; /// <summary> /// Add. /// </summary> public DictionaryNode Add(char value) { // Handle null field. if (this._dict == null) { this._dict = new Dictionary<char, DictionaryNode>(); } // Lookup and return if possible. DictionaryNode result; if (this._dict.TryGetValue(value, out result)) { return result; } // Store. result = new DictionaryNode(); this._dict[value] = result; return result; } /// <summary> /// Get. /// </summary> public DictionaryNode Get(char value) { if (this._dict == null) { return null; } // Lookup. DictionaryNode result; if (this._dict.TryGetValue(value, out result)) { return result; } return null; } /// <summary> /// Set word. /// </summary> public void SetWord(string word) { this._word = word; } /// <summary> /// Get word. /// </summary> public string GetWord() { return this._word; } }
In this example, Add and Get are similar. They both handle a possible null Dictionary. Then they do a lookup in the Dictionary for the value specified. They return the DictionaryNode found, or null.
Add: This method will add a new DictionaryNode to the Dictionary if one does not exist. It one already exists, it will not add another.
Example 3. Next, we use the tree type. We create a new instance of DictionaryNodeRoot. We then call AddWord on three words. We show that the basic functionality is correct by using ContainsWord.
Note: This program does not extensively test the tree. It does show it works without throwing an exception.
However: I have used this exact tree with several large word lists with correct results. This is anecdotal evidence.
C# program that uses tree class Program { static void Main() { DictionaryNodeRoot tree = new DictionaryNodeRoot(); tree.AddWord("dot"); tree.AddWord("net"); tree.AddWord("deves"); Console.WriteLine(tree.ContainsWord("deves")); Console.WriteLine(tree.ContainsWord("nothing")); } } Output True False
Discussion. In software many applications require collections of strings. An obvious example is a spell-checker. A program like Microsoft Word must scan documents for spelling mistakes. All those words are stored somehow in memory.
One advantage to this tree data structure is that it accommodates vast data sets. This particular tree could handle any number of words within the limits of memory without a big performance drop.
The core design goal for this tree was simplicity. There are many straightforward ways to optimize the tree—see the performance section. But these make the design less obvious. They might not work for every requirement.
Performance. As noted the performance of this tree is sub-par. Using the Dictionary as a field for child nodes is a slow implementation. An array, on each node, would be faster. Each element could instead be indexed in this array.
Further optimizations are possible. The entire data structure can be collapsed into a couple int arrays. These arrays essentially are a flattened form of the Dictionaries. The arrays reference one another.
Int ArrayFlatten ArrayOptimizations
And: A tree can be reduced so that the prefixes and suffixes are shared. The implementation here shares only prefixes.
Info: Some further performance work is shown in the article on directed acyclic word graphs.
Premature optimization is a pitfall. Over the years I have designed this same tree about five times. And in none of those times has the performance been that critical. Mainly, the ability to handle huge data sets is core.
Summary. We designed a tree data structure. It uses Dictionary fields to locate child nodes. It works with strings. But the basic design can be adapted to other value types. And the node implementation can be optimized with other representations.