TheDeveloperBlog.com


C# Normalize Method

Normalize. The Normalize method changes Unicode character sequences. A string's buffer is represented in Unicode. Normalize affects how the Unicode characters are ordered. We explore how the representations of string data change.


Example. This program introduces a string with an accent on the lowercase a (á). Next, we call Normalize with no parameters, and then Normalize with the parameters NormalizationForm.FormD, FormKC, and FormKD on the same input string.

Then: We print, with Console.WriteLine, the resulting strings to the screen as we go along.

Console.WriteLine
C# program that uses Normalize method

using System;
using System.Text;

class Program
{
    static void Main()
    {
	const string input = "á";

	string val2 = input.Normalize();
	Console.WriteLine(val2);

	string val3 = input.Normalize(NormalizationForm.FormD);
	Console.WriteLine(val3);

	string val4 = input.Normalize(NormalizationForm.FormKC);
	Console.WriteLine(val4);

	string val5 = input.Normalize(NormalizationForm.FormKD);
	Console.WriteLine(val5);
    }
}

Output

á
a '
á
a '

In this example, the first call to Normalize, with no parameters, uses the NormalizationForm.FormC enumerated constant in its implementation. This detail can be seen in IL Disassembler.

IL Disassembler Tutorial

The four lines printed to the Console have two forms: the "a" with the accent on top, and also an ASCII "a" with a single-quote character following it. In FormD and FormKD, the single-quote character follows the accented letter.


Discussion. Mainly, the Normalize method is useful for interoperability purposes. If you have to interact with another program that uses Unicode in a specific normalization form, it would be important to call Normalize.

Tip: There is no reason to call Normalize if you are just using ASCII or if you are not interoperating with another Unicode form.


IsNormalized. In Unicode strings, there are different normalization forms that determine how certain characters are represented. With the IsNormalized method you can test for normalized character data.

This program declares a const string with the value á. With the Normalize and IsNormalized methods, only non-ASCII characters are affected. With Normalize, we convert to FormC and FormD.

Const

Note: The parameterless IsNormalized method returns true if the string is normalized to FormC. It returns false if the form is FormD.

Bool Methods, Return True and False

Note 2: You can also pass an argument to IsNormalized. In this case, that specific normalization form is checked.

C# program that uses IsNormalized

using System;
using System.Text;

class Program
{
    static void Main()
    {
	const string input = "á";
	string val2 = input.Normalize();
	string val3 = input.Normalize(NormalizationForm.FormD);

	Console.WriteLine(input.IsNormalized());
	Console.WriteLine(val2.IsNormalized());
	Console.WriteLine(val3.IsNormalized());
	Console.WriteLine(val3.IsNormalized(NormalizationForm.FormD));
    }
}

Output

True
True
False
True

The IsNormalized method addresses the need to determine the normalization status of a string. In the .NET Framework, normalization is necessary when interoperating with other systems.

Typically: You can ignore IsNormalized and just leave strings in their default normalization format.


Summary. The Normalize method has an important purpose. It mainly provides interoperation with other systems. It is not a commonly needed string method. But it reveals an important detail of the string implementation.