TheDeveloperBlog.com


C# Dictionary Size and Performance

Dictionary size influences lookup performance. Smaller Dictionaries are faster than larger Dictionaries. This is true when they are tested for keys that always exist in both. Reducing Dictionary size could help improve performance.

Dictionary size and performance benchmark

Full Dictionary:      791 ms
Half-size Dictionary: 591 ms [faster]


Benchmark. We see a benchmark that reads in a list of 172820 English words. These are English words so have similar letter frequencies and patterns. A Dictionary containing all the words is compared to a Dictionary that contains the first half.

Then: We measure lookup times for the first half of words, such that all keys exist.

Benchmark
C# program that benchmarks Dictionary instances

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;

class Program
{
    static void Main()
    {
	// Read in list of English words.
	var lines = new List<string>(File.ReadAllLines("enable1.txt"));

	// This dictionary contains all strings. [172820 keys]
	var dictionary1 = lines.ToDictionary(s => s, s => true);

	// Shrink the List to half its size.
	lines = lines.Take(lines.Count / 2).ToList();

	// This dictionary contains half. [86410 keys]
	var dictionary2 = lines.ToDictionary(s => s, s => true);

	const int m = 100;
	Stopwatch s1 = Stopwatch.StartNew();
	for (int i = 0; i < m; i++)
	{
	    foreach (string s in lines)
	    {
		if (!dictionary1.ContainsKey(s))
		{
		    throw new Exception();
		}
	    }
	}
	s1.Stop();
	Stopwatch s2 = Stopwatch.StartNew();
	for (int i = 0; i < m; i++)
	{
	    foreach (string s in lines)
	    {
		if (!dictionary2.ContainsKey(s))
		{
		    throw new Exception();
		}
	    }
	}
	s2.Stop();
	Console.WriteLine("{0},{1}", s1.ElapsedMilliseconds, s2.ElapsedMilliseconds);
	Console.Read();
    }
}

The first several lines read in the word list, then make two Dictionaries. The List is trimmed to half its size with Take, and then the second Dictionary is created. We use the Take and ToDictionary methods from LINQ.

Take, TakeWhileToDictionary

Note: The two loops are surrounded with Stopwatches, and their times are measured.

And: Each call to ContainsKey succeeds and returns true. We have a 100% hit rate here.


Results. The smaller Dictionary (with half the number of keys) was much faster. In this case, the behavior of both Dictionaries on the input was identical. This means that having unneeded keys in the Dictionary makes it slower.

My perspective is that you should use separate Dictionaries for separate purposes. If you have two sets of keys, do not store them in the same Dictionary. If you can divide them up, you can enhance lookup performance.

Hashtables usually are implemented with internal arrays of buckets. When the hash function results in a collision with its mathematical index, the runtime will have to loop through an array of the values, comparing strings each time.

Therefore: Reducing total hashtable size enhances average performance with fewer collisions.


Summary. Here we see an example of how you enhance the performance of your Dictionary by subdividing it. Even when the large and small Dictionaries have the identical results on a set of input, the smaller one will perform better on average.