C# String Normalize()

In C#, the Normalize() method is used to normalize a string by applying Unicode normalization forms. Unicode normalization forms define a standard representation for characters that have multiple possible sequences of code points.

The Normalize() method is available on the System.String class and has several overloads. Here’s the basic syntax:

public string Normalize(NormalizationForm normalizationForm)

The normalizationForm parameter specifies the Unicode normalization form to apply. It can take one of the following values from the System.Text.NormalizationForm enumeration:

  • FormC: This normalization form represents composed characters, where characters that can be represented with a single Unicode code point are replaced with their composed form.
  • FormD: This normalization form represents decomposed characters, where characters that can be represented with multiple Unicode code points are replaced with their decomposed form.
  • FormKC: This normalization form represents compatibility composed characters, where characters that are compatible but not canonically equivalent are replaced with their composed form.
  • FormKD: This normalization form represents compatibility decomposed characters, where characters that are compatible but not canonically equivalent are replaced with their decomposed form.

The Normalize() method returns a new string that represents the normalized form of the original string.

Here’s an example usage:

string originalString = "Café";
string normalizedString = originalString.Normalize(NormalizationForm.FormD);
Console.WriteLine(normalizedString);  // Output: Café

In the example above, the original string “Café” contains a composed character ‘é’, which can also be represented as a decomposed sequence of ‘e’ and combining acute accent. By applying the Normalize() method with NormalizationForm.FormD, the string is normalized to its decomposed form “Café”, where ‘e’ and the combining acute accent are separate code points.

Note that Unicode normalization may not affect all strings, as not all strings contain characters with multiple representations.

Parameter:

The Normalize() method in C# has an optional parameter called NormalizationForm. This parameter allows you to specify the Unicode normalization form to apply when normalizing the string. Here’s the detailed syntax of the Normalize() method:

public string Normalize(NormalizationForm normalizationForm)

The normalizationForm parameter is of type NormalizationForm, which is an enumeration defined in the System.Text namespace. It specifies the specific normalization form to be applied to the string.

Here are the possible values for the NormalizationForm enumeration:

  • FormC: This represents the composed normalization form, where characters that can be represented with a single Unicode code point are replaced with their composed form.
  • FormD: This represents the decomposed normalization form, where characters that can be represented with multiple Unicode code points are replaced with their decomposed form.
  • FormKC: This represents the compatibility composed normalization form, where characters that are compatible but not canonically equivalent are replaced with their composed form.
  • FormKD: This represents the compatibility decomposed normalization form, where characters that are compatible but not canonically equivalent are replaced with their decomposed form.

Here’s an example of using the Normalize() method with the NormalizationForm.FormC parameter:

string originalString = "Café";
string normalizedString = originalString.Normalize(NormalizationForm.FormC);
Console.WriteLine(normalizedString);  // Output: Café

In the above example, the original string “Café” is already in the composed form, so applying NormalizationForm.FormC does not change the string.

If you omit the NormalizationForm parameter, the Normalize() method uses the default normalization form, which is NormalizationForm.FormC.

Return:

The Normalize() method in C# returns a new string that represents the normalized form of the original string. It does not modify the original string itself. Here’s the return type and behavior of the Normalize() method:

Return Type:

  • string: The method returns a new string object that represents the normalized form of the original string.

Behavior:

  • The Normalize() method creates and returns a new string that is the normalized form of the original string based on the specified normalization form.
  • The original string remains unchanged; the method does not modify it.
  • The normalized string may have different code point representations for certain characters, depending on the normalization form used.
  • If the original string is already in the specified normalization form, the method may return the same string instance without creating a new one. This behavior may vary depending on the specific implementation.

Here’s an example demonstrating the return behavior of the Normalize() method:

string originalString = "Café";
string normalizedString = originalString.Normalize(NormalizationForm.FormC);

Console.WriteLine(normalizedString);          // Output: Café
Console.WriteLine(originalString);           // Output: Café
Console.WriteLine(object.ReferenceEquals(originalString, normalizedString));  // Output: False

In the example above, the Normalize() method is called with NormalizationForm.FormC on the originalString. The method returns a new string instance representing the composed form of the original string, which is assigned to the normalizedString variable. The original string originalString remains unchanged, and both strings are different instances.

C# String Normalize() Method Example:

Certainly! Here’s an example that demonstrates the usage of the Normalize() method in C#:

using System;

class Program
{
    static void Main()
    {
        string originalString = "Café";
        
        // Normalize the string to composed form (NormalizationForm.FormC)
        string composedForm = originalString.Normalize(System.Text.NormalizationForm.FormC);
        Console.WriteLine("Composed Form: " + composedForm);  // Output: Café
        
        // Normalize the string to decomposed form (NormalizationForm.FormD)
        string decomposedForm = originalString.Normalize(System.Text.NormalizationForm.FormD);
        Console.WriteLine("Decomposed Form: " + decomposedForm);  // Output: Café
        
        // Normalize the string to compatibility composed form (NormalizationForm.FormKC)
        string compatibilityComposedForm = originalString.Normalize(System.Text.NormalizationForm.FormKC);
        Console.WriteLine("Compatibility Composed Form: " + compatibilityComposedForm);  // Output: Café
        
        // Normalize the string to compatibility decomposed form (NormalizationForm.FormKD)
        string compatibilityDecomposedForm = originalString.Normalize(System.Text.NormalizationForm.FormKD);
        Console.WriteLine("Compatibility Decomposed Form: " + compatibilityDecomposedForm);  // Output: Café
    }
}

In this example, we have a string originalString with the value “Café”. We use the Normalize() method with different NormalizationForm values to demonstrate different normalization forms.

  • The first call to Normalize() with NormalizationForm.FormC normalizes the string to the composed form, which is the same as the original string. The output is “Café”.
  • The second call to Normalize() with NormalizationForm.FormD normalizes the string to the decomposed form, where the character ‘é’ is represented by two code points: ‘e’ and a combining acute accent. The output is “Café”.
  • The third call to Normalize() with NormalizationForm.FormKC normalizes the string to the compatibility composed form, which is the same as the original string. The output is “Café”.
  • The fourth call to Normalize() with NormalizationForm.FormKD normalizes the string to the compatibility decomposed form, where the character ‘é’ is represented by two code points: ‘e’ and a combining acute accent. The output is “Café”.

Each call to Normalize() returns a new string representing the normalized form, without modifying the original string. The example illustrates the effect of different normalization forms on the string “Café”.