Introduction
Most developers I’ve encountered when playing with strings and/or bytes are always interested to see what is really inside the physical device. They always imagine, “What’s really inside?” Even I sometimes ask myself, “Do we really need to see the ones and zeroes?”
Basically, you can play with bytes, bits and strings and see the represented byte-string. In this article will try to explore the different ways to manipulate byte-arrays to string and string to byte-arrays. Lastly, we are going to tackle a bit about the “Encodings”, and focus more on the methods such as “GetyBytes”,“GetByteCount” and “BitConverter”.
Background
So before we play with strings and bytes. I want to introduce you a summary basic concept of ASCII and Unicode. Here are some lists to take note of,
- ASCII (American Standard Code for Information Interchange) and Unicode are used for communication, wherein a computer can possibly transfer data from one computer to another.
- ASCII uses 7 bits to represent a character and have been extended to 8 bits “extended ASCII” which solves the Latin alphabet while Unicode represents more languages in the world. That’s wherein the “Unicode Encoding” comes into play because most characters don’t fit into the 8-bits size.
- Unicode encodings come into play because we need numerous ways to store a character in a byte sequence. See the different type of Encodings: UTF8, UTF-16, and UTF-32.
For more information about the difference of ASCII and Unicode, please see my LinkedIn
article.
.NET Encoding
Before we start with the examples, I would like to introduce you to the “System.Text.Encoding”. It is basically an abstract class which is intended to represent a character encoding. It also provides methods to convert arrays to strings of Unicode characters to and from arrays of bytes. Here are the list of the derived classes,
- ASCIIEncoding
- UTF7Encoding
- UTF8Encoding
- UnicodeEncoding
- UTF32Encoding
See the figure 1 below to visualize the hierarchy of the inheritance.
Figure 1
If you want to programmatically get the derived encoding classes, see the sample code below,
- [TestMethod]
- public void Test_Types_Of_Derived_Encoding_Classes()
- {
- var type = typeof(Encoding);
- var assembly = Assembly.GetAssembly(type);
- var types = assembly.GetTypes();
-
- var derivedClasses =
- types.Where(t => t.IsSubclassOf(type) && t.IsPublic == true).ToList();
-
- foreach (var @class in derivedClasses)
- {
- Console.WriteLine(@class.Name);
- }
- }
Just some notes to keep in mind, when using those properties to use different encodings from the abstract class “System.Text.Encoding” such as ASCII, UTF8, UTF7, etc. It actually creates a new instance of that derived class. Please see the code below,
Note
The code is based from
here.
Therefore you can create a new instance of a certain encoding type or you can use the abstract class and choose a property-encoding type specific for your needs.
- var utf8 = new UTF8Encoding();
- var utf8_2 = Encoding.UTF8;
Let us try to see if the concept is true, see the code example below:
- [TestMethod]
- public void Test_If_Encodings_Are_Same_Type() {
- Assert.IsInstanceOfType(Encoding.ASCII, typeof(ASCIIEncoding));
- Assert.IsInstanceOfType(Encoding.UTF7, typeof(UTF7Encoding));
- Assert.IsInstanceOfType(Encoding.UTF8, typeof(UTF8Encoding));
- Assert.IsInstanceOfType(Encoding.Unicode, typeof(UnicodeEncoding));
-
- Assert.IsInstanceOfType(Encoding.UTF32, typeof(UTF32Encoding));
- }
String to Byte Array
In order to convert string to byte array you need a specific Encoding, then use the “GetBytes” method. As it converts a string into byte array let us also see the character and its equivalent numerical ASCII/Unicode value. Just a note using the ASCII-encoding uses 7 bits while UTF8-encoding uses 8 bits to represent a character.
See the example below,
- string strRandomWords = "I Love C#";
- [TestMethod]
- public void Test_ASCII_Using_GetBytes()
- {
-
- var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords);
- Assert.IsTrue(byteResults.Length > 0);
- #region iterate
- foreach (var @byte in byteResults)
- {
- string fullResultInString =
- string.Format("Character: {0} in ASCII {1}",
- (char)@byte, @byte) ;
-
- Console.WriteLine(fullResultInString);
- }
- #endregion
- }
Now if you are interested also to get the byte size you then can use “GetByteCount” method. In our example we get the number of bytes depending on the encoding type. I decided to double check if the expected bits are correct. See the two examples below,
- string strRandomWords = "I Love C#";
- [TestMethod]
- public void Test_ASCII_Using_GetByteCount() {
-
- var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords);
-
- int byteCount = Encoding.ASCII.GetByteCount(this.strRandomWords);
- int totalBits = 0;
- for (int counter = 0; counter < byteResults.Length; counter++) {
- string bits = Convert.ToString(byteResults[counter], 2);
- totalBits = bits.Length + totalBits;
- }
-
- Assert.AreEqual(byteCount, Math.Ceiling((totalBits / 7.00)));
- }
-
- string strRandomNonEnglishStrings = "プログラミングが大好き";
- [TestMethod]
- public void Test_UTF8_Encoding_Using_GetByteCount() {
-
- var byteResults = Encoding.UTF8.GetBytes(this.strRandomNonEnglishStrings);
-
- int byteCount = Encoding.UTF8.GetByteCount(this.strRandomNonEnglishStrings);
- int totalBits = 0;
- for (int counter = 0; counter < byteResults.Length; counter++) {
- string bits = Convert.ToString(byteResults[counter], 2);
- totalBits = bits.Length + totalBits;
- }
-
- Assert.AreEqual(byteCount, Math.Ceiling((totalBits / 8.00)));
- }
Byte-Array to String
From the previous examples, this shows how we can get the byte-array from a string. Now, we can try to see, how those byte-arrays are represented as a string and see what does the byte-array actually represent in a human readable format.
To see a series of bytes we can then use “BitConverter,” a helper class which helps developers to convert data-types to array-types and array of bytes to base data types. Let us see some examples below,
- string strRandomWords = "I Love C#";
- [TestMethod]
- public void Test_Convert_String_To_Bytes_Formatted() {
- var bytes = Encoding.UTF8.GetBytes(strRandomWords);
- Assert.IsNotNull(bytes);
-
- var seriesOfByteStrings = BitConverter.ToString(bytes);
- Assert.IsTrue(!string.IsNullOrWhiteSpace(seriesOfByteStrings));
- Console.WriteLine(seriesOfByteStrings);
- }
- [TestMethod]
- public void Test_Convert_To_Bytes_Formatted_Using_Other_Value_Types() {
- int bday = 03291982;
- var result = BitConverter.GetBytes(bday);
- Assert.IsNotNull(result);
- var seriesOfBytesStrings = BitConverter.ToString(result);
- Assert.IsTrue(!string.IsNullOrWhiteSpace(seriesOfBytesStrings));
- Console.WriteLine(seriesOfBytesStrings);
- }
Lastly, we can use “GetString” to get the exact human readable format. Let us see the example below:
- string strRandomWords = "I Love C#";
- [TestMethod]
- public void Test_ASCII_Using_Get_String()
- {
- var byteResults = Encoding.ASCII.GetBytes(this.strRandomWords);
- Assert.IsTrue(byteResults.Length > 0);
- string humanReadableString =
- Encoding.ASCII.GetString(byteResults, 0, byteResults.Length);
- Assert.AreEqual(humanReadableString, strRandomWords);
- }
Summary
In this article, we have explored a brief concept of ASCII & Unicode. We have also seen that “System.Text.Encoding” does have derived classes such as ASCIIEncoding, UTF7Encoding, UTF8Encoding, UnicodeEncoding & UTF32Encoding. Upon learning those derived classes you may choose to use the Encoding.[Encoding-Type] e.g. Encoding.ASCII or creating a new instance e.g. var asci = new ASCIIEncoding().After that we have focused on the conversion of String to Byte array and vice versa.
By the way, most the source code samples are also available on
GitHub. I really did enjoy creating this article, I’m hoping you felt the same way too as you read it. Until next time, happy programming.