This word (with the diacritics) is consisted of 9 characters, sequentially as following:
- Meem
- Damma (a combining character combined with the previous Meem)
- Kashida
- Hah
- Meem
- Shadda (a combining character)
- Fatha (a combining character both Shadda and Fatha are combined with the Meem)
- Kashida
- Dal
After characters combined with their bases we end up with 6 characters, sequentially as following:
- Meem (have a Damma above)
- Kashida
- Hah
- Meem (have a Shadda and a Fatha above)
- Kashida
- Dal
The following code simply enumerates the string and displays a message box with each character along with its index:
string someName = "مُـحمَّـد";
for (int i = 0; i < someName.Length; i++) MessageBox.Show(string.Format("{0}t{1}", someName[i]));
|
What we get? When enumerating the string, we enumerate its base characters only.
Enumerating a String with Combining Characters
.NET Framework provides a way for enumerating strings with combining characters, it is via the TextElementEnumerator and StringInfo types (both reside in namespace System.Globalization.)
The following code demonstrates how you can enumerate a string along with its combining characters:
string someName = "مُـحمَّـد";
TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(someName);
while (enumerator.MoveNext()) MessageBox.Show(string.Format("{0}t{1}", enumerator.ElementIndex, enumerator.Current));
|
Comparing Strings
Sometimes, you will be faced with a situation where you need to compare two identical strings differ only by their diacritics (combining characters) for instance. If you were to compare them using the common way (using String.Compare for instance) they would be different because of the combining characters.
To overcome this you will need to use a special overload of String.Compare method:
string withCombiningChars = "مُـحمَّـد"; string withoutCombiningChars = "محمد";
Console.WriteLine(string.Compare(withCombiningChars, withoutCombiningChars) == 0 ? "Both strings are the same." : "The strings are different!");
if (string.Compare(withCombiningChars, withoutCombiningChars, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreSymbols) == 0) Console.WriteLine("Both strings are the same."); else Console.WriteLine("The strings are different!");
|
The Kashida ـ isn't of the Arabic alphabets. It's most likely be a space! So the option CompareOptions.IgnoreSymbols ignores it from comparison.
Writing Arabic diacritics
The following table summarizes up the Arabic diacritics and the keyboard shortcut for each character:
Unicode Representation |
Character |
Name |
Shortcut |
0x064B |
|
Fathatan |
Shift + W |
0x064C |
|
Dammatan |
Shift + R |
0x064D |
|
Kasratan |
Shift + S |
0x064E |
|
Fatha |
Shift + Q |
0x064F |
|
Damma |
Shift + E |
0x0650 |
|
Kasra |
Shift + A |
0x0651 |
|
Shadda |
Shift + ~ |
0x0652 |
|
Sukun |
Shift + X |
Using the Character Map Application
Microsoft Windows comes with an application that help you browsing the characters that a font supports. This application is called, Character Map.
You can access this application by typeing charmap.exe into Run, or pressing Start->Programs->Accessories->System Tools->Character Map.
Try it out!
Code examples for the reader to discover:
A.
string someName = "مُـحمَّـد";
MessageBox.Show(StringInfo.GetNextTextElement(someName,2));
B.
string a = "Adam"; string b = "Ádam";
Console.WriteLine(string.Compare(a, b) == 0 ? "They are the same." : "No, They are different.");
// Also try changing the CultureInfo object if (string.Compare(a, b, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0) Console.WriteLine("They are the same."); else Console.WriteLine("No, They are different.");
|