C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
We look at the string GetHashCode virtual method implementation. This method is located in mscorlib. It uses unsafe code to compute well-distributed hash codes.
And: Hash codes are used in Dictionary instances and other associative collections.
Example. Initially we look at a disassembled version of the intermediate language instructions that form the GetHashCode override method in the System.String type. This method is located in the mscorlib assembly.
And: It is invoked in every program that uses a Dictionary or Hashtable collection using C# or VB.NET.
Note: It uses several magic constants and low-level pointer arithmetic to achieve its performance requirement.
String GetHashCode method implementation: C# [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] public override unsafe int GetHashCode() { fixed (char* str = this) { char* chPtr = str; int num = 352654597; int num2 = num; int* numPtr = (int*)chPtr; for (int i = this.Length; i > 0; i -= 4) { num = (((num << 5) + num) + (num >> 27)) ^ numPtr[0]; if (i <= 2) { break; } num2 = (((num2 << 5) + num2) + (num2 >> 27)) ^ numPtr[1]; numPtr += 2; } return (num + (num2 * 1566083941)); } }
The method is decorated with the unsafe keyword, which signals that it can use lower-level instructions such as pointer arithmetic. The unsafe keyword also must be allowed with a special compiler flag in the .NET Framework.
Note: Using the unsafe keyword also has higher security demands on executables. The built-in GetHashCode method avoids these problems.
The fixed keyword is used in GetHashCode. It indicates the char* pointer that is assigned to the string 'this' must not be moved by the garbage collector during any possible collections while the fixed block is executing.
Note: Once you apply the fixed keyword to the context, you can use pointer arithmetic on the string's internal character vector.
Unrolled loop body. The method body uses an advanced and hard-to-read optimization called loop unwinding or loop unrolling. This is a technique that can skip several places ahead on each loop iteration, rather than just a single place.
Bit manipulations. The loop body uses the >> and << operators in several places and this operation does a bitwise manipulation that shifts the bits left or right. By shifting bits in a hash computation, you can avoid funneling.
Note: This is where the computations made later erase those made previously, reducing the hash code distribution.
Performance. The GetHashCode method here is virtual and this incurs a small performance overhead. Often the optimizations applied by the Microsoft .NET Framework team cancel out this inefficiency.
For this reason, trying to loop over a string will often be much slower than the built-in GetHashCode. In implementations where the hash computation performance is similar, the method shown often has better distributions.
And: This makes the resulting hash tables (Dictionary) that use the built-in method faster.
Discussion. If you look in the annotated edition of "The C# Programming Language", you can find the entries for "Unsafe keyword" and "Bugs" indexed together. Unsafe code is likely to cause errors.
Therefore: You should be reluctant to implement a custom GetHashCode method using unsafe code unless you really need to.
Correction: This article previously had an error in the section on bit manipulations. Bret Mulvey wrote in with the correct information.
IEqualityComparer. You can use IEqualityComparer to specify to the generic types in the .NET Framework how they should compare items and compute identity numbers for hash tables. This allows you to customize the hash code based on your specific data.
Summary. GetHashCode is implemented on the System.String type. It is called when we use a Hashtable or Dictionary with string keys. It has unsafe code that uses pointer arithmetic, bit shifting and an unwound loop.