C# Capacity Property

This C# experiment shows how capacity can affect the performance of collections.

Capacity. Is it worthwhile to set capacities?

We are able to specify capacity for C# collections. This optional Capacity property adjusts the amount of memory allocated. It is specified in the constructor of List and Dictionary.

Capacity benchmark results

Method A: Dictionary, no capacity
Time:     1350 ms

Method B: Dictionary, has capacity
Time:     700 ms

Method C: Dictionary, const capacity
Time:     760 ms

Method D: Dictionary, over-large capacity
Time:     1005 ms

Method E: List, no capacity
Time:     1010 ms

Method F: List, accurate capacity
Time:     575 ms

Intro. I found that when you don't specify a capacity, your collection will be reallocated several times if it even grows to have 100 elements. This makes populating your Dictionary or List twice as slow.

The capacity of a Dictionary(TKey, TValue) is the number of elements that can be added to the Dictionary(TKey, TValue) before resizing is necessary. As elements are added to a Dictionary(TKey, TValue), the capacity is automatically increased as required by reallocating the internal array.

Dictionary Constructor: MSDN

Why set capacity? "If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary(TKey, TValue)."

Is it important? Should developers bother with Dictionary or List capacity? What I find might be surprising. Having .NET "resize" the underlying arrays is expensive. It can double the time even in small collections.

Note: Another final thing to consider is memory pressure. This increases when reallocations occur.

So: When the collections' underlying arrays are resized, this results in more memory pressure due to the reallocation.

Benchmark. Here we see the benchmark that tests the capacity settings. I tried to simulate realistic situations. Most Dictionaries I have used have a string key and around 100-1000 elements. So I used a 100-element size of the collections.

C# program that tests capacity

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;

class CapacityTest
{
    /// <summary>
    /// The number of iterations.
    /// </summary>
    static int _m = 100000;

    /// <summary>
    /// List of values we use for the test.
    /// </summary>
    List<string> _values = new List<string>();

    public CapacityTest()
    {
	// 100 random strings for benchmarking.
	for (int i = 0; i < 100; i++)
	{
	    _values.Add(System.IO.Path.GetRandomFileName());
	}

	long t1 = Environment.TickCount;

	A_Dictionary();

	long t2 = Environment.TickCount;

	B_DictionaryCapacity();

	long t3 = Environment.TickCount;

	C_DictionaryCapacity();

	long t4 = Environment.TickCount;

	D_DictionaryCapacity();

	long t5 = Environment.TickCount;

	E_List();

	long t6 = Environment.TickCount;

	F_ListCapacity();

	long t7 = Environment.TickCount;

	// Write dictionary times
	Console.WriteLine(t2 - t1);
	Console.WriteLine(t3 - t2);
	Console.WriteLine(t4 - t3);
	Console.WriteLine(t5 - t4);

	// Write list times
	Console.WriteLine(t6 - t5);
	Console.WriteLine(t7 - t6);
	Console.ReadLine();
    }

    void A_Dictionary()
    {
	// No capacity
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>();
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void B_DictionaryCapacity()
    {
	// Capacity from collection Count
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(_values.Count);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void C_DictionaryCapacity()
    {
	// Const capacity
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(100);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void D_DictionaryCapacity()
    {
	// Huge capacity (1 order of magnitude too large)
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(1000);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void E_List()
    {
	// No capacity
	for (int i = 0; i < _m * 5; i++)
	{
	    List<string> l = new List<string>();
	    foreach (string k in _values)
	    {
		l.Add(k);
	    }
	}
    }

    void F_ListCapacity()
    {
	// Exact capacity
	for (int i = 0; i < _m * 5; i++)
	{
	    List<string> l = new List<string>(100);
	    foreach (string k in _values)
	    {
		l.Add(k);
	    }
	}
    }
}

It is advantageous to specify capacity in your C# programs. I found the best method was using the List's already-known capacity as the capacity for the new collection. That would be 100.

Estimation. You can modify your program to print out what the Count property of your collections is after they are used. If you are working with 150 elements, you can simply use the constant 200 for better performance (saving several array resizes).

So: For my latest programs, I decided to simply define some const global variables and use them through my classes.

Tip: Make a static class and use const fields called ElementCountEstimate or other fields with the word estimate.

Class that stores capacities: C#

static class GlobalClass
{
    public const int ElementCountEstimate = 200;
    public const int OtherCountEstimate = 100;
}

Summary. The memory model of .NET programs is complex. It is efficient and takes care of most of the hard stuff. But giving it hints about the sizes of the collections is a substantial optimization.

Note: I could not test the impact of the capacity property in my real-world programs. The improvement is lost in the noise.

.Net

.NET Array Dictionary List String 2D Async DataTable Dates DateTime Enum File For Foreach Format IEnumerable If IndexOf Lambda LINQ Parse Path Process Property Regex Replace Sort Split Static StringBuilder Substring Switch Tuple

Java

Core Array ArrayList HashMap String 2D Cast Character Console Deque Duplicates File For Format HashSet If IndexOf Lambda Math ParseInt Process Random Regex Replace Sort Split StringBuilder Substring Switch Vector While

TheDeveloperBlog.com

C# Capacity Property

Capacity. Is it worthwhile to set capacities?

.Net

Java

Related Links