TheDeveloperBlog.com


C# Dictionary Optimization Tip

Dictionary optimization. A Dictionary can be optimized with a higher capacity. We sometimes optimize hashtables like Dictionary simply by changing the capacity of the collection to a value higher than the default. This trades space for speed.

Capacity

Benchmark. First, the implementation of Dictionary is opaque and few tutorials will tell you the tricks to squeeze out nanoseconds from lookups. The example here passes a parameter to the Dictionary constructor to indicate a minimum capacity.

And: This forces the Dictionary to allocate at least that many internal "buckets" and entries.

So: By increasing capacity, we reduce hash collisions and improve performance. A multiplier of 4 yields a speedup of 7% in the example.

C# program that optimizes Dictionary

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

class Program
{
    static void Main()
    {
	// Loop through full capacity multipliers.
	for (int multiplier = 1; multiplier <= 10; multiplier++)
	{
	    const int len = 500;
	    var dict = new Dictionary<string, bool>(len * multiplier); // Allocate with multiplied capacity
	    var arr = GetStrings(len); // Get random keys

	    foreach (string val in arr)
	    {
		dict[val] = true; // Set keys
	    }

	    const int m = 5000 * 10;
	    Stopwatch s1 = Stopwatch.StartNew();
	    for (int i = 0; i < m; i++)
	    {
		for (int j = 0; j < arr.Length; j++)
		{
		    bool b = dict[arr[j]]; // Lookup element
		    b = dict[arr[0]];      // Lookup first element
		}
	    }
	    s1.Stop();

	    // Write timings
	    Console.Write(multiplier.ToString("00"));
	    Console.Write(", ");
	    Console.Write(s1.ElapsedMilliseconds);
	    Console.WriteLine(" ms");
	}
	Console.Read();
    }

    static string[] GetStrings(int len)
    {
	// Allocate and return an array of random strings.
	var arr = new string[len];
	for (int i = 0; i < arr.Length; i++)
	{
	    arr[i] = Path.GetRandomFileName();
	}
	return arr;
    }
}

Output

01, 2744 ms   (Exact capacity)
02, 2665 ms
03, 2553 ms
04, 2546 ms   (7.2% faster than exact capacity with multiplier 1)
05, 2569 ms
06, 2562 ms
07, 2532 ms
08, 2552 ms
09, 2531 ms
10, 2573 ms

The program defines two methods in the Program.cs file. The Main entry point runs the simulations that time the Dictionary lookup speed on the Dictionary with different capacities.

Details: The program puts 500 strings as keys into the Dictionary. The capacity will be 500, 1000, 1500, and so on.

The program defines the GetStrings method. This uses the Path.GetRandomFileName method to generate 500 random file names of 12 characters. This array is returned and used to populate the Dictionary.

Path.GetRandomFileName

Next, we write ten lines to the screen, with each indicating the current multiplier and the time for all lookups to complete. There will be 50000 loops over the entire collection of 500 strings, which is 25 million lookups.

Result: The program shows that multiplying the full capacity by 4 can improve lookup performance by 7.2% over multiplying it by 1.

Multiply

Discussion. We describe the internal data structures in the Dictionary collection in that allows this optimization tip to yield results. Whenever you add or lookup an entry in the Dictionary, a buckets array is accessed (written or read).

The buckets array contains integers that point to the actual data structs in an Entry array. When you have more buckets, you can more closely map the bucket integers to the accurate entry.

And: With fewer buckets you will have to read in the next entries in the chains more.

Int Array

Summary. We looked at a way to optimize a Dictionary. We apply a small multiplier such as 4 to the initial and final capacity of the Dictionary. This optimization allows the Dictionary to have more space to store buckets that point to entries.