C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
We sometimes optimize hashtables like Dictionary simply by changing the capacity of the collection to a value higher than the default. This trades space for speed.
Benchmark. First, the implementation of Dictionary is opaque and few tutorials will tell you the tricks to squeeze out nanoseconds from lookups. The example here passes a parameter to the Dictionary constructor to indicate a minimum capacity.
And: This forces the Dictionary to allocate at least that many internal "buckets" and entries.
So: By increasing capacity, we reduce hash collisions and improve performance. A multiplier of 4 yields a speedup of 7% in the example.
C# program that optimizes Dictionary using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; class Program { static void Main() { // Loop through full capacity multipliers. for (int multiplier = 1; multiplier <= 10; multiplier++) { const int len = 500; var dict = new Dictionary<string, bool>(len * multiplier); // Allocate with multiplied capacity var arr = GetStrings(len); // Get random keys foreach (string val in arr) { dict[val] = true; // Set keys } const int m = 5000 * 10; Stopwatch s1 = Stopwatch.StartNew(); for (int i = 0; i < m; i++) { for (int j = 0; j < arr.Length; j++) { bool b = dict[arr[j]]; // Lookup element b = dict[arr[0]]; // Lookup first element } } s1.Stop(); // Write timings Console.Write(multiplier.ToString("00")); Console.Write(", "); Console.Write(s1.ElapsedMilliseconds); Console.WriteLine(" ms"); } Console.Read(); } static string[] GetStrings(int len) { // Allocate and return an array of random strings. var arr = new string[len]; for (int i = 0; i < arr.Length; i++) { arr[i] = Path.GetRandomFileName(); } return arr; } } Output 01, 2744 ms (Exact capacity) 02, 2665 ms 03, 2553 ms 04, 2546 ms (7.2% faster than exact capacity with multiplier 1) 05, 2569 ms 06, 2562 ms 07, 2532 ms 08, 2552 ms 09, 2531 ms 10, 2573 ms
The program defines two methods in the Program.cs file. The Main entry point runs the simulations that time the Dictionary lookup speed on the Dictionary with different capacities.
Details: The program puts 500 strings as keys into the Dictionary. The capacity will be 500, 1000, 1500, and so on.
The program defines the GetStrings method. This uses the Path.GetRandomFileName method to generate 500 random file names of 12 characters. This array is returned and used to populate the Dictionary.
Next, we write ten lines to the screen, with each indicating the current multiplier and the time for all lookups to complete. There will be 50000 loops over the entire collection of 500 strings, which is 25 million lookups.
Result: The program shows that multiplying the full capacity by 4 can improve lookup performance by 7.2% over multiplying it by 1.
Discussion. We describe the internal data structures in the Dictionary collection in that allows this optimization tip to yield results. Whenever you add or lookup an entry in the Dictionary, a buckets array is accessed (written or read).
The buckets array contains integers that point to the actual data structs in an Entry array. When you have more buckets, you can more closely map the bucket integers to the accurate entry.
And: With fewer buckets you will have to read in the next entries in the chains more.
Summary. We looked at a way to optimize a Dictionary. We apply a small multiplier such as 4 to the initial and final capacity of the Dictionary. This optimization allows the Dictionary to have more space to store buckets that point to entries.