C# Locality Optimizations (Memory Hierarchy)

Understand locality of reference and temporal locality. Think about the memory hierarchy.

Locality. Locality is an important concept in performance. Things that are local (have more locality) are nearer together—in memory, in storage, or in time.

Locality, reference. Locality of reference tells us that elements nearer in memory are faster to access. This is a low-level consideration. It has no connection to high-level languages.Optimization

Benchmark. Computer memory is often referred to as RAM (random access memory). But random access memory is not random. Its performance is constrained by the physical nearness of elements.

Version 1: This code accesses the first 5 elements in the int array many times. It does nothing important with the values.

Version 2: This version of the code accesses 5 elements in the same int array, but the indexes of those elements are further apart.

Int Array Array

Result: In an array with 100,000 elements, accessing the first 5 repeatedly is faster than accessing 5 elements further apart.

Benchmark

Info: At a high level, the instruction count is equivalent. But the physical aspects of the hardware come into play.

C# program that benchmarks locality of reference using System; using System.Diagnostics; class Program { static void Method1(int[] array) { // Assign elements near in physical location. for (int i = 0; i < 10; i++) { array[0] = 1; array[1] = 1; array[2] = 1; array[3] = 1; array[4] = 1; } } static void Method2(int[] array) { // Assign elements spread out in location. for (int i = 0; i < 10; i++) { array[0] = 1; array[10000] = 1; array[20000] = 1; array[30000] = 1; array[40000] = 1; } } const int _max = 10000000; const int _size = 100000; static void Main() { int[] array = new int[_size]; Method1(array); Method2(array); array = new int[_size]; GC.Collect(); // Version 1: access near indexes. var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { Method1(array); } s1.Stop(); array = new int[_size]; GC.Collect(); // Version 2: access far-apart indexes. var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { Method2(array); } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000 * 1000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000 * 1000) / _max).ToString("0.00 ns")); Console.Read(); } } Output 20.99 ns 23.65 ns

Temporal locality. Temporal locality enhances performance. It tells us whether memory locations in a program are likely to be accessed again in the near future.

And: A method has high temporal locality if it is called repeatedly in a short period of time.

Example: Think of a program that reads in 1000 files, generates a report for each file, and then writes 1000 output files.

Important: If we do the steps in groups, this could improve temporal locality, improving cache usage—and speed up the program.

Program outline with low temporal locality: C# Read data file Generate output Write output file Read data file Generate output Write output file Read data file Generate output Write output file Program outline with higher temporal locality: C# Read data file Read data file Read data file Generate output Generate output Generate output Write output file Write output file Write output file

Memory hierarchy. Optimization involves the memory hierarchy. We see an example memory hierarchy in a fairly modern computer. This guides optimization of computer programs.

Also: The information on this table influences how compilers, operating systems and databases work.

Virtual Memory: Disk > 2GB (Typical size) 3-15 ms (Access time) Physical Memory 256MB-2GB 100-150 ns 2nd-Level Cache 128KB-4MB 40-60 ns 1st-Level Cache 16-64KB 5-10 ns Registers: Processor 32 Words 1 ns

Notes, table. The table shows the critical performance aspects of a memory hierarchy on a computer. You can see that reading a file from the disk requires about 3-15 milliseconds.Compilers: stanford.edu

So: Reading data from memory requires 100-150 nanoseconds. And reading data from the processor caches requires 1-60 nanoseconds.

Info: We can compute how much faster you can load a small data object from memory instead of the disk.

Note: Please recall that one millisecond is equal to one million nanoseconds. One nanosecond is small.

And: Let us say that the computer is busy and the disk requires 15 milliseconds, and the memory requires 150 nanoseconds.

Disk time 15 ms * 1000000 = 15000000 ns Memory time 150 ns Memory performance Memory is 100,000x (one-hundred thousand) times faster.

Compiler theory. Let us consider the key point of locality. The farther away an element is, the longer it may take to access it. Keep things near, and performance improves.

A summary. We tested locality of reference. Even in high-level languages, locality of reference comes into play. This influences or even greatly determines performance levels.

The Dev Codes

.Net

.NET Array Dictionary List String 2D Async DataTable Dates DateTime Enum File For Foreach Format IEnumerable If IndexOf Lambda LINQ Parse Path Process Property Regex Replace Sort Split Static StringBuilder Substring Switch Tuple

Java

Core Array ArrayList HashMap String 2D Cast Character Console Deque Duplicates File For Format HashSet If IndexOf Lambda Math ParseInt Process Random Regex Replace Sort Split StringBuilder Substring Switch Vector While

TheDeveloperBlog.com

C# Locality Optimizations (Memory Hierarchy)

Related Links:

.Net

Java

Related Links