In C#, the Normalize()
method is used to normalize a string by applying Unicode normalization forms. Unicode normalization forms define a standard representation for characters that have multiple possible sequences of code points.
The Normalize()
method is available on the System.String
class and has several overloads. Here’s the basic syntax:
public string Normalize(NormalizationForm normalizationForm)
The normalizationForm
parameter specifies the Unicode normalization form to apply. It can take one of the following values from the System.Text.NormalizationForm
enumeration:
FormC
: This normalization form represents composed characters, where characters that can be represented with a single Unicode code point are replaced with their composed form.FormD
: This normalization form represents decomposed characters, where characters that can be represented with multiple Unicode code points are replaced with their decomposed form.FormKC
: This normalization form represents compatibility composed characters, where characters that are compatible but not canonically equivalent are replaced with their composed form.FormKD
: This normalization form represents compatibility decomposed characters, where characters that are compatible but not canonically equivalent are replaced with their decomposed form.
The Normalize()
method returns a new string that represents the normalized form of the original string.
Here’s an example usage:
string originalString = "Café"; string normalizedString = originalString.Normalize(NormalizationForm.FormD); Console.WriteLine(normalizedString); // Output: Café
In the example above, the original string “Café” contains a composed character ‘é’, which can also be represented as a decomposed sequence of ‘e’ and combining acute accent. By applying the Normalize()
method with NormalizationForm.FormD
, the string is normalized to its decomposed form “Café”, where ‘e’ and the combining acute accent are separate code points.
Note that Unicode normalization may not affect all strings, as not all strings contain characters with multiple representations.
Parameter:
The Normalize()
method in C# has an optional parameter called NormalizationForm
. This parameter allows you to specify the Unicode normalization form to apply when normalizing the string. Here’s the detailed syntax of the Normalize()
method:
public string Normalize(NormalizationForm normalizationForm)
The normalizationForm
parameter is of type NormalizationForm
, which is an enumeration defined in the System.Text
namespace. It specifies the specific normalization form to be applied to the string.
Here are the possible values for the NormalizationForm
enumeration:
FormC
: This represents the composed normalization form, where characters that can be represented with a single Unicode code point are replaced with their composed form.FormD
: This represents the decomposed normalization form, where characters that can be represented with multiple Unicode code points are replaced with their decomposed form.FormKC
: This represents the compatibility composed normalization form, where characters that are compatible but not canonically equivalent are replaced with their composed form.FormKD
: This represents the compatibility decomposed normalization form, where characters that are compatible but not canonically equivalent are replaced with their decomposed form.
Here’s an example of using the Normalize()
method with the NormalizationForm.FormC
parameter:
string originalString = "Café"; string normalizedString = originalString.Normalize(NormalizationForm.FormC); Console.WriteLine(normalizedString); // Output: Café
In the above example, the original string “Café” is already in the composed form, so applying NormalizationForm.FormC
does not change the string.
If you omit the NormalizationForm
parameter, the Normalize()
method uses the default normalization form, which is NormalizationForm.FormC
.
Return:
The Normalize()
method in C# returns a new string that represents the normalized form of the original string. It does not modify the original string itself. Here’s the return type and behavior of the Normalize()
method:
Return Type:
string
: The method returns a new string object that represents the normalized form of the original string.
Behavior:
- The
Normalize()
method creates and returns a new string that is the normalized form of the original string based on the specified normalization form. - The original string remains unchanged; the method does not modify it.
- The normalized string may have different code point representations for certain characters, depending on the normalization form used.
- If the original string is already in the specified normalization form, the method may return the same string instance without creating a new one. This behavior may vary depending on the specific implementation.
Here’s an example demonstrating the return behavior of the Normalize()
method:
string originalString = "Café"; string normalizedString = originalString.Normalize(NormalizationForm.FormC); Console.WriteLine(normalizedString); // Output: Café Console.WriteLine(originalString); // Output: Café Console.WriteLine(object.ReferenceEquals(originalString, normalizedString)); // Output: False
In the example above, the Normalize()
method is called with NormalizationForm.FormC
on the originalString
. The method returns a new string instance representing the composed form of the original string, which is assigned to the normalizedString
variable. The original string originalString
remains unchanged, and both strings are different instances.
C# String Normalize() Method Example:
Certainly! Here’s an example that demonstrates the usage of the Normalize()
method in C#:
using System; class Program { static void Main() { string originalString = "Café"; // Normalize the string to composed form (NormalizationForm.FormC) string composedForm = originalString.Normalize(System.Text.NormalizationForm.FormC); Console.WriteLine("Composed Form: " + composedForm); // Output: Café // Normalize the string to decomposed form (NormalizationForm.FormD) string decomposedForm = originalString.Normalize(System.Text.NormalizationForm.FormD); Console.WriteLine("Decomposed Form: " + decomposedForm); // Output: Café // Normalize the string to compatibility composed form (NormalizationForm.FormKC) string compatibilityComposedForm = originalString.Normalize(System.Text.NormalizationForm.FormKC); Console.WriteLine("Compatibility Composed Form: " + compatibilityComposedForm); // Output: Café // Normalize the string to compatibility decomposed form (NormalizationForm.FormKD) string compatibilityDecomposedForm = originalString.Normalize(System.Text.NormalizationForm.FormKD); Console.WriteLine("Compatibility Decomposed Form: " + compatibilityDecomposedForm); // Output: Café } }
In this example, we have a string originalString
with the value “Café”. We use the Normalize()
method with different NormalizationForm
values to demonstrate different normalization forms.
- The first call to
Normalize()
withNormalizationForm.FormC
normalizes the string to the composed form, which is the same as the original string. The output is “Café”. - The second call to
Normalize()
withNormalizationForm.FormD
normalizes the string to the decomposed form, where the character ‘é’ is represented by two code points: ‘e’ and a combining acute accent. The output is “Café”. - The third call to
Normalize()
withNormalizationForm.FormKC
normalizes the string to the compatibility composed form, which is the same as the original string. The output is “Café”. - The fourth call to
Normalize()
withNormalizationForm.FormKD
normalizes the string to the compatibility decomposed form, where the character ‘é’ is represented by two code points: ‘e’ and a combining acute accent. The output is “Café”.
Each call to Normalize()
returns a new string representing the normalized form, without modifying the original string. The example illustrates the effect of different normalization forms on the string “Café”.