MSIL Programming: Part 1

Abstract

The source code written and that executes under the .NET Common Language Runtime (CLR) is referred to as “managed code”. The managed compiler translates the associated *.cs, *.vb code files into low-level .NET CIL code, assembly manifest, and type metadata eventually. Hence, MSIL is one of the programming languages supported by the .NET Framework where we can create, build, and compile .NET applications by standalone CIL code too. Moreover, the MSIL code is the backbone of every .NET assembly, the more you dig deeper into the CIL instruction sets, the better you understand advanced .NET application development. In this article, you will understand the comprehensive treatment of MSIL instruction sets and semantics by authoring a simple program using CIL opcodes and the role of the CIL compiler ilasm.exe to build and execute that .NET assembly code without employing the typical Visual Studio IDE build process.

Essentials

Programming with CIL instruction sets is rather complicated and considered to be one of the challenging tasks because here, the developer encounters the CLR built-in grammar directly, which is called “opcodes”, instead of user-friendly C#, F#, or VB.Net English language syntax. Hence, it is advisable to install the following tools in the researcher's machine on this voyage.

  • .NET Framework 4.0 or later
  • Visual Studio 2010 IDE or later
  • ILDASM.exe, ILASM.exe utility
  • Notepad++
  • Sharp developer (optional)
  • Xamarin Studio (optional)

Although CIL code can be authored via the simple Notepad editor, it is recommended to write CIL code using full-fledged editors like Sharp Developers.

MSIL Internals

A .NET assembly contains CIL code, that is conceptually similar to Java bytecode in that it is not compiled to platform-specific instructions until absolutely necessary. The .NET CLR leverages a JIT compiler for each CPU targeting the runtime, each optimized for the underlying platform. The .NET binaries contain metadata that describes the characteristics of every type within the binary. The metadata is officially termed a manifest that contains information about the current version of the assembly and lists of all the externally referenced assemblies and culture information.

The previous figure is apparently, demonstrating that each of the .Net authorized programming source codes is eventually compiled into CIL rather than directly to a specific instruction set. Such potential makes all the . NET-supported languages are capable of interacting with each other. Furthermore, the CIL code provides the same benefits Java professionals have grown accustomed to.

Compilation Life Cycle

Figure 1-1. The .NET Compilation Life-Cycle.

Each. NET-supported programming language maps their respective keywords to CIL mnemonics. Intermediate Language (IL) code tends to be cryptic and completely incomprehensible, for instance when loading a string variable into memory, we don't employ a user-friendly opcode name StringLoading, but rather ldstr. Assume we have constructed the following sample program in the C# language to understand the corresponding generated code behind CIL grammars.

The previous C# code is performing a simple addition of two numeric values via the testCalculation method. The .NET binaries do not contain platform-specific instructions but rather use an agnostic IL code that is generated using the corresponding C# compiler (csc.exe) during the build process.

Listing 1: Simple C# console application.

class Program
{
    static void Main(string[] args)
    {
        // Method Calling
        testCalculation(20, 40);
        Console.ReadKey();
    }
    
    // Demo static Method
    static void testCalculation(int iPar1, int iPar2)
    {
        int Result;
        Result = iPar1 + iPar2;
        Console.WriteLine("Calculation Output :: {0}", Result);
    }
}

Once you compile this code, the CLR locates and loads that .NET binary into memory and you end up with a single *.exe assembly that contains a manifest, metadata, and CIL instructions eventually. Fortunately, the .NET framework ships with an excellent utility to disassemble any .NET binary into its corresponding IL code referred to as “ILDASM.EXE”.

We could employ the ildasm.exe utility to disassemble the IL code, either using a command prompt mode or a typical GUI representation. If you were to open this assembly using ILDASM.EXE in GUI mode, you would encounter the real back-end representation of each C# code statement in the corresponding CIL opcodes instruction set as in the following.

CIL Type exe assembly

The ILDASM.EXE loads up any .NET assembly and investigates its contents, including CIL code, manifest, and metadata. The ILDASM.EXE is typically capable of dumping all of the metadata from .NET binaries in a CIL opcode representation. Let's double-click the test calculation method to examine its underlying generated CIL code as in the following.

CIL code

Figure 1-3. CIL code.

Furthermore, if you would like to explore the type metadata for the currently loaded assembly, then press Ctrl + M which shows the metadata about the test calculation method as in the following.

Metadata

Figure 1-4. Metadata

IL Opcode Grammar

CIL is a full-fledged, Object Oriented Programming language like C# and encompasses the constituents of typical OOP features like inheritance, classes, control statements, interfaces, and much more. As we claimed earlier, we can author the .NET application directly in MSIL indeed without using the Visual Studio IDE. But a question commonly arises, why is CIL programming so important to understand, because it aids developers in writing and maintaining code better and debugging it? The following table illustrates a brief description of the typical Common Intermediate Language (CIL) instruction set.

Table: IL opcode meanings

IL opcode meanings

In the same way, the following table illustrates how typical C# keywords (data types) map to corresponding CIL keywords. As you can see these CIL keywords are usually referenced in CIL programming.

Table: CIL Data Types Mapping

CIL Data Types Mapping

Creating the First IL Program

So, are you ready to take up the challenge? Authoring pure IL code is deemed to be a cumbersome task, unlike C# code. We can develop either kind of application, for instance, a console, Windows, or web-based application but the foremost hindrance we usually encounter when coding, is not having IntelliSense support. IL coding could be done using any normal editor like Notepad; this is the real beauty of IL coding. We will write a simple “Hello World!” program using Notepad and later compile that code using the ILASM.EXE utility. Hence, open Notepad, save it with *.il extension (such as “Helloworld. il”), and in it use the following code that displays a simple “Hello World!” string over the console.

Listing 2: First “Hello World” program coding in IL.

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89)
  .ver 4:0:0:0
}

.assembly cilHelloWorld
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}

.module cilHelloWorld.exe

.imagebase 0x00400000

.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00020003


// =============== CLASS MEMBERS DECLARATION ==================//

.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object
{
  .method private hidebysig static void Main(string[] args) cil managed
  {
    .entrypoint
    .maxstack 8

    IL_0000: nop
    IL_0001: ldstr "First CIL program, Hello World!"
    IL_0006: call void [mscorlib]System.Console::WriteLine(string)
    IL_000b: nop
    IL_000c: call string [mscorlib]System.Console::ReadLine()
    IL_0011: pop
    IL_0012: ret
  }

  // ================= Constructor ================================//
  .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
  {
    .maxstack 8
    IL_0000: ldarg.0
    IL_0001: call instance void [mscorlib]System.Object::.ctor()
    IL_0006: ret
  } // end of constructor
}

// ====================== End of Class ==========================//

As just explained, we have specified a .NET class, method, namespace, and types in terms of CIL using diverse attributes and directives to do the simple “Hello world!” feat. The important thing to remember about CIL directives is that they are never crafted with a dot prefix, such as its C# counterpart.

Finally, save that code file and open a Visual Studio command prompt to manipulate it with ILASM.EXE compiles and debugs the HelloWorld.il file and produces a corresponding executable file as in the following.

Output

HelloWorld.il compilation process using ILASM.EXE.

ILASM EXE

After finishing the IL coding or any kind of subtle code modification, it is recommended to verify the compiled .NET binary image using the PEVERIFY.EXE command-line utility that examines all the labels within the specified assembly for valid CIL directives as in the following.

Output

CompileHelloWorld.exe verification.

CompileHelloWorld.exe verification

Finally, it is time to test the generated .NET assembly (executable) file to determine whether or not it is producing the desired output. Hence, run the executable directly at the command prompt and observe the output as in the following.

Output

CompileHelloWorld.exe execution.

Execution

Programmers usually need not be deeply concerned with the binary opcodes unless they build some extremely low-level .NET software. Instead, CIL coding attracts especially those reverse engineers who are patching buggy software as well as who detect subtle vulnerabilities by disassembling executables. Sometimes, code glitches are inadvertently left by a developers when they write the source code that can be exploited later by malicious hackers. Reverse engineers typically tend to utilize CIL code to add or remove features in existing software when the source code is not available.

Code Analysis

The HelloWorld.il file commences by declaring the .assembly extern token for referencing the mscorlib.dll file. The .publickeytoken attribute specifies the public key token value of the mscorlib.dll file and the .ver attribute determines the version of the .NET platform you have installed on your development computer.

.assembly extern mscorlib
{
    .publickeytoken = (B77A5C561934E089)
    .ver 4:0:0:0
}

The next section defines the assembly namespace name as “cilHelloWorld”, followed by its version number 1.0.0.0 and hashing algorithm attributes.

.assembly cil HelloWorld
{
    .hash algorithm 0x00008004
    .ver 1:0:0:0
}

Then the .module directive determines the type of final producing assembly as such executable or DLL file.

.module cilHelloWorld.exe

Thereafter, the imagebase directive to 0x00400000 establishes the base address where the binary is loaded.

.imagebase 0x00400000

The .file directive adds some definition to the manifest of the assembly that is useful for documentation as in the following.

.file alignment 0x00000200

The .stackreserve directive configures the default stack size as 0x00100000.

.stackreserve 0x00100000

The .subsystem indicates whether the application is a console-based or GUI-based program. Here 3 specifies console-based and 2 specifies GUI based as in the following.

.subsystem 0x0003

The .corflags establishes the default run time header information in the CLI as in the following.

.corflags 0x00020003

After defining all the essential directives, such as .module, corflags, imagebase, and so on, we shall outline the definition for the class Program type that extends from the System. Object base type. Here, the beforefieldinit stipulates, that the type should be initialized before a static field value as in the following.

.class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object

Although we shall discuss in detail all the .NET type definitions in terms of IL coding in forthcoming papers, here it is essential to specify the definition of a default class constructor in the IL file as in the following.

.method public hidebysig specialname rtspecialname instance void .ctor() cil managed

 The program class contains the definition for the application entry point method Void Main. Here, the hidebysig conceals the base class interface of this method as in the following:

.method private hidebysig static void Main(string[] args) cil managed

The method, that is the entry point is of a program, will always contain the following directives.

.entrypoint

The .maxstack directive sets a default value of 8 that specifies the maximum number of variables pushed onto the stack during execution.

.maxstack 8

Now, the real implementation starts in the Main() method body, by portraying various tokens. These tokens are called code labels (IL_0001, IL006). In fact, these code labels are completely optional and we can remove them as in the following.

Code labels

In the previous code, the execution starts with a nop that specifies that no operation is to be done yet. Then the last instruction loads a string with the value “First CIL program, Hello World!” into a memory stick. Finally, the call instruction calls the Console.WriteLine method to print that specific string. The essential code culminates using a nop opcode again. At the end of execution, the pop instruction removes the current value from the top of the stack and places it into a local variable and the program terminates using a ret instruction.

Synopsis

As we have seen, the .NET assemblies contain an ultimate CIL code that is compiled to platform-specific instructions using JIT. In addition, we have explored assembly metadata and manifest contents by examining the CIL opcode using the ILDASM.EXE utility as well as, the description of typically used keywords for CIL coding. On behalf of the essential IL keywords or labels, we have drafted our first “Hello world!” program in genuine IL programming code and came to understand how to compile IL code and verify it.

References

  1. ECMA-335 manual
  2. MS-Press book visual C# 2008: The language