.NET Binary Reverse Engineering: Part 1

Introduction

The prime objective of this article is to explain the .NET mother language called Common Instruction Language (CIL) that has laid the foundation of  .NET. Here, you will understand the distinction between CIL directives, attributes, opcodes, and numerous CIL tools that provide a significant role in code execution. The trigger for writing this article is to provide a deep analysis and examination of CIL grammar.

The source code of any software or executable application is the intellectual property of a vendor company and is not disclosed for proprietary reasons. Without the actual code, we need to rely on the native code so it is necessary to delve into the CIL before proceeding to code disassembly. Apart from that, we shall discuss some of the advanced conceptions related to Reverse Engineering such as Round-tripping engineering, Obfuscation, and Code Disassembling by using some advanced tools such as IDAPro, Ollydbg, Hex Editor, Ilasm, and Reflector in the forthcoming articles of this series.

Abstract

The Microsoft Intermediate Language (MSIL) is an essential piece of the CLR and the code that is written and executed under the CLR is referred to as Managed Code. The managed compiler translates that code (*.cs file) into CIL code, a manifest, and metadata. This process typically undergoes two compilation phases. The first compilation phase is done by the compiler in which the source code is transformed into the MSIL. The second compilation phase occurs at run time when the MSIL code is compiled to native code. The .NET platform is considered language-independent because the process execution of a managed application is identical regardless of the source language. Finally, the CIL is a full-fledged .NET programming language, with its syntax and compiler.

The beauty of MSIL code is that it is compiled once and executed anywhere using the JIT compiler that compiles assemblies into native binary code that targets a specific platform. You can write an application and deploy that application to Windows, Linux, Macs, and other platforms that support the .NET run time.

Prerequisite

To execute and examine MSIL/CIL code, you need to configure your machine with the following tools.

  • .NET Framework 3.5 or higher
  • Either SharpDeveloper Studio or Xamarin Studio
  • Visual Studio Command Prompt
  • IL Disassembler (ildasm.exe)
  • Reflector

Understanding CIL

When you build a .NET assembly using your managed language of choice (including C#, VB .Net, F#, Perl, and COBOL), the associated compiler translates your source code into Common Instruction Language. CIL is just another structural .NET programming language, it is possible to build .NET assemblies directly using the CIL and the CIL compiler (ILASM.EXE) that ships with the .NET framework.

The more you understand the grammar of CIL, the better able you are to move into the arena of advanced .NET programming. The programmer with a comprehensive knowledge of the CIL can do the following tasks.

  • Disassembling an existing assembly, editing the CIL code, and recompiling the updated code.
  • The CIL is the only .NET language that allows you to access each aspect of CTS and CLS.
  • Building in-house dynamic assemblies using the System.Reflection.Emit namespace API.

CIL does not simply define a general set of keywords such as public, private, new, get, set, or this. Rather, the token set understood by the CIL compiler is subdivided into three categories. Each category of CIL token is expressed using a specific syntax. The three categories are as follows.

CIL directive

Directives are represented syntactically using a single dot prefix (.class, .assembly). They are a set of CIL tokens that are used to describe the structure of a .NET assembly called CIL directives. They are used to inform the CIL compiler to define the namespace, class, and methods that will populate an assembly.

CIL attributes

Sometimes CIL directives are not descriptive enough to fully express the definition of a given type, however, they can be further specified with various CIL attributes to qualify how a directive should be processed.

CIL opcodes

The operation codesm or opcodes, provide the type implementation logic once a .NET assembly namespace and type have been defined in terms of CIL code.

Despite providing numerous advantages, CIL programming has some drawbacks such as unsafe code. CIL source code is inherently unsafe and could lead to disaster.

First CIL program

We need a code editor to author our first CIL program, for instance, Notepad or Wordpad but it is good to write code for another full-fledged open source .NET IDE such as SharpDevelop or Xamarin Studio because they are integrated with an existing .NET FCL and automatically directive recognition feature. No matter which IDE or editor we are using, the important point is to save that CIL code file with a *.il extension.

The following cope illustrates the first Hello World program using the CIL programming language. However, open Notepad place the following code, and save this file as Test.il.

.assembly extern mscorlib {}
.assembly FirstApp {}
.namespace FirstApp {
    .class private auto ansi beforefieldinit Test {
        .method public hidebysig static void Main(string[] argd) cil managed {
            .entrypoint
            .maxstack 1
            ldstr "Welcome to CIL programming world"
            call void [mscorlib] System.Console::WriteLine(string)
            ret
        }
    }
}

File: Test.il

CIL code compilation

After finishing the code, save this file as Test.il and compile it using the .NET Framework tool ILASM.exe shipped with the .NET Framework as in the following command.

ILASM /exe /debug Test.il

Here the exe option indicates that the target is a console-based application. The debug option asks the compiler to generate a debug file (test.pdb) for the application that is useful for viewing source code in a debugger or disassembler.

Ilasm

After successfully compiling the Test.il file, Test.exe is created in the project directory and is the final executable that yields our desired output as in the following.

After successfully

When you build or modify assemblies using CIL code, it is always advisable to verify that the compiled binary image is a well-formed .NET image using the peverify.exe utility as in the following.

Preverify

Here in the previous figure, it is proved that all opcodes within the test.exe binary are valid CIL codes. While the CIL compiler has numerous command-line options as follows.

Numerous command line

In the previous CIL code source file Test.il, the first declaration is an external reference to the mscorlib library. The mscorlib.dll contains the core of the .NET Framework FCL that includes the System.Console class. The second assembly directive is simply the name of the assembly, FirstApp, and the third directive defines the namespace.

.assembly extern mscorlib {}
.assembly FirstApp {}
// class namespace
.namespace FirstApp { ...... }

The next lines define a class and a method within the class. The class directive introduces a public class named Test that implicitly inherits the System.Object class. The method directive defines the public Main as a member method. The CIL keyword indicates that the method contains an intermediate code.

.class private auto ansi beforefieldinit Test
{
    .method public hidebysig static void Main(string[] argd) cil managed { ... }
}

The Main method commences with two directives. The .entrypoint directive designates Main as the entry point of the application. The .maxstack sets the size of the memory stack to 1 slot. The last directive loads the string into memory. The call directive consumes one item from the memory and displays it using the WriteLine method. Finally, the ret directive indicates the return or exit from the method.

.entrypoint

.maxstack 1

list "Welcome to CIL programming world"

call void [mscorlib] System.Console::WriteLine(string)

ret

CIL Code Post-mortem Analysis

CIL is much easier to understand and interpret compared to assembly language. The contents of source code in CIL programming are case sensitive like C# but the statements are not terminated with a semicolon. Apart from that, the most significant part of a CIL application is dotted prefixed directives and actual executable source code. There are several categories of directives proposed by the .NET CLR such as Assembly, Class, and Method.

To understand the CIL code directive, we shall write a console application using the Xamarin Studio that adds two integer types. Although we can develop such an application using other code editors, Xamarin Studio provides more functionality and facilities in terms of writing crucial IL coding rather than other editors.

So first open the Xamarin Studio and select New Solution from the File menu. Then choose IL type Console Project from the project templates as in the following.

Console project

Thereafter, rename the main.il to MathFun.il and place the following code in the MathFun.il file. We shall discuss each segment of the *.il file in the next section.

.assembly extern mscorlib
{
  .publickeytoken = (B77A5C561934E089)
  .ver 2:0:0:0
}
.assembly MathFun
{
  .ver 1:0:0:0
  .locale "en.US"
}
.module MathFun.exe
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00000003
// =============== CLASS MEMBERS DECLARATION ===================
.class public auto ansi beforefieldinit MathFun
    extends [mscorlib]System.Object
{
  .field private string '<Name>k__BackingField'
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
  .method public hidebysig specialname rtspecialname 
          instance void  .ctor(string name) cil managed
  {
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  nop
    IL_0008:  ldarg.0
    IL_0009:  ldarg.1
    IL_000a:  call       instance void MathFun::set_Name(string)
    IL_000f:  nop
    IL_0010:  nop
    IL_0011:  ret
  } // end of method Test::.ctor
  .method public hidebysig specialname instance string get_Name() cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       11 (0xb)
    .maxstack  1
    .locals init (string V_0)
    IL_0000:  ldarg.0
    IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'
    IL_0006:  stloc.0
    IL_0007:  br.s       IL_0009
    IL_0009:  ldloc.0
    IL_000a:  ret
  } // end of method Test::get_Name
  .method public hidebysig specialname instance void set_Name(string 'value') cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  ldarg.1
    IL_0002:  stfld      string MathFun::'<Name>k__BackingField'
    IL_0007:  ret
  } // end of method Test::set_Name
  .method public hidebysig instance string Display() cil managed
  {
    // Code size       22 (0x16)
    .maxstack  2
    .locals init ([0] string CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldstr      "Hello "
    IL_0006:  ldarg.0
    IL_0007:  call       instance string MathFun::get_Name()
    IL_000c:  call       string [mscorlib]System.String::Concat(string, string)
    IL_0011:  stloc.0
    IL_0012:  br.s       IL_0014
    IL_0014:  ldloc.0
    IL_0015:  ret
  } // end of method Test::Display
  .method public hidebysig instance int32 Addition(int32 x, int32 y) cil managed
  {
    // Code size       9 (0x9)
    .maxstack  2
    .locals init ([0] int32 CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldarg.1
    IL_0002:  ldarg.2
    IL_0003:  add
    IL_0004:  stloc.0
    IL_0005:  br.s       IL_0007
    IL_0007:  ldloc.0
    IL_0008:  ret
  } // end of method Test::Addition
  .property instance string Name()
  {
    .get instance string MathFun::get_Name()
    .set instance void MathFun::set_Name(string)
  } // end of property Test::Name
} // end of class MathOperation.Test
.class private auto ansi beforefieldinit MathFun extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       57 (0x39)
    .maxstack  4
    .locals init ([0] class MathFun obj)
    IL_0000:  nop
    IL_0001:  ldstr      "Ajay"
    IL_0006:  newobj     instance void MathFun::.ctor(string)
    IL_000b:  stloc.0
    IL_000c:  ldloc.0
    IL_000d:  callvirt   instance string MathFun::Display()
    IL_0012:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_0017:  nop
    IL_0018:  ldstr      "Addition is: {0}"
    IL_001d:  ldloc.0
    IL_001e:  ldc.i4.s   15
    IL_0020:  ldc.i4.s   35
    IL_0022:  callvirt   instance int32 MathFun::Addition(int32, int32)
    IL_0027:  box        [mscorlib]System.Int32
    IL_002c:  call       void [mscorlib]System.Console::WriteLine(string, object)
    IL_0031:  nop
    IL_0032:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
    IL_0037:  pop
    IL_0038:  ret
  } // end of method 
}

MathFun.il

Now build this program using F8. After successful compilation, the final executable MathFun.exe file is created in the project Bin/Debug folder of the solution directory.

Assembly Directives

The assembly directive contains information that the compiler produces to the manifest, that is metadata about the overall assembly. This section lists common assembly directives as in the following:

.assembly extern

This directive represents an external assembly. The public types and methods of the referenced assembly are available to the current assembly. Here is the syntax.

.assembly extern name as alaisname { }


We implement such a construct in the MathFun.il file by referencing the mscorlib.dll as in the following.

.assembly extern mscorlib  
{  
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                           
  .ver 2:0:0:0  
} 

Because of the importance of mscorlib.dll, the ILASM compiler automatically includes an external assembly reference to that library.

.assembly


It defines the simple name of the assembly. The assembly can be defined by specifying the friendly name of the binary as in the following.

.assembly CILType { }

There are some of the sub-directives available in the assembly block as in the following.

  • .ver
  • .locale
  • .publickey

By taking the reference of the MathFun.il file, we are updating the assembly definition to include a version number of 1.0.0.0 using the .ver directive and culture information using .locale; such construction would be as in the following.

.assembly MathFun
{
  .ver 1:0:0:0
  .locale "en.US"
}

.module

The .module directive ensures the final executable extension of the files, such as *.exe as in the following.

.module MathFun.exe
.imagebase

The .imagebase directive sets the base address where the application is loaded. The default is 0x00400000.

.imagebase 0x00400000

.file

The .file directive adds a file to the manifest of the assembly. This is useful for associating helper documents with an assembly.

.file alignment 0x00000200

The metadata is the primary option and stipulates that the file is unmanaged.

.stackreserve

The .stackreserve directive configures the stack size to 0x00100000 which is the default.

.stackreserve 0x00100000

.subsystem

The .subsystem directive indicates the subsystem used by the application, such as a console or GUI subsystem. Here the syntax is as in the following.

.subsystem number

Specify 3 for console applications and 2 for GUI applications. So in the following, we are constructing a console application.

.subsystem 0x0003

.corflags

The .corflags directive sets the runtime flag in the CLI header that stipulates an IL-only assembly. The default value is 1 for the corflags.

.corflags 0x00000003 (As reference to MathFun.il)

.maxstack

The .maxstack directive establishes the maximum number of variables that may be pushed onto the stack during execution.

.maxstack 8 (default value)

Class directives

This part describes the important class directives. It contains the following significant directive.

.class

The .class directive defines a new reference, value, or interface type. Here, the syntax is as in the following.

attributes class name extends base type implements interface

As in the previous MathTest.il file, we implement the class MathOperation using the .class directive in this way as in the following.

.class public auto ansi beforefieldinit MathFun

extends [mscorlib]System.Object

The class directive is also adorned with a variety of attributes. Here is a short list of the most common.

  • Abstract: Indicate class can't instantiated.
  • ANSI and Unicode: Determine the format of the string.
  • Auto: CLR controlled the Memory layout of fields by this.
  • Beforefieldinit: The type should be initialized before a static class is accessed.
  • Private and Public: Set the visibility outside the class

The Test class also implements a constructor specification as Test() to initialize the field data as in the C# version.

public Test(string name)
{
    this.Name = name;   
}

So its IL code would be as follows.

.field private string '<Name>k__BackingField'
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
.method public hidebysig specialname rtspecialname instance void .ctor(string name) cil managed
{
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  nop
    IL_0008:  ldarg.0
    IL_0009:  ldarg.1
    IL_000a:  call       instance void MathFun::set_Name(string)
    IL_000f:  nop
    IL_0010:  nop
    IL_0011:  ret
}

.property

The property directive adds a property member to a class. Here, the syntax is as in the following.

.property attributes return property name parameters default { body }

If we define a property in C# code as in the following.

public string Name
{
    get;
    set;
}

Then its corresponding MSIL code counterpart for the Get and Set property would be as in the following.

.method public hidebysig specialname instance string get_Name() cil managed
{
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       11 (0xb)
    .maxstack  1
    .locals init (string V_0)
    IL_0000:  ldarg.0
    IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'
    IL_0006:  stloc.0
    IL_0007:  br.s       IL_0009

    IL_0009:  ldloc.0
    IL_000a:  ret
} // end of method Test::get_Name
.method public hidebysig specialname instance void set_Name(string 'value') cil managed
{
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  ldarg.1
    IL_0002:  stfld      string MathFun::'<Name>k__BackingField'
    IL_0007:  ret
}

.method

This directive defines the method in a class. Here is the syntax.

.method attributes callingconv return method name arguments { body }

We are defining two methods Display() and Addition(). The first one would show “Hello” text on the screen and the second Addition() method would compute the sum of two integer types supplied variables in the method as in the following.

public string Display()
{
    return "Hello " + Name;
}
public int Addition(int x, int y)
{
    return (x + y);
}

The corresponding IL code for the methods is as in the following.

.method public hidebysig instance string Display() cil managed
{
    // Code size       22 (0x16)
    .maxstack  2
    .locals init ([0] string CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldstr      "Hello "
    IL_0006:  ldarg.0
    IL_0007:  call       instance string MathFun::get_Name()
    IL_000c:  call       string [mscorlib]System.String::Concat(string, string)
    IL_0011:  stloc.0
    IL_0012:  br.s       IL_0014
    IL_0014:  ldloc.0
    IL_0015:  ret
}

The method attribute has some additional attributes as in the following.

  • Hidesign: hides the base class interface of this method.
  • Special name: this is used for special methods such as get_Property and set_Property.
  • Rtspecialname: this indicates the special method referred to as a constructor.
  • Cil or il: the method contains the MSIL code.
  • Native: the method contains platform-specific code.
  • Managed: indicate the implementation is managed.

.field

The field directive indicates a newly defined field that is state information for a class. Here, the syntax is as in the following.

.field attributes type field name

In the C# code, we can define an integer type field as in the following.

.field private initonly int32 x
.field private initonly int32 y

Main() Method Directives

The method block can contain both directives and the implementation code (CIL).

.entrypoint

This directive designates a method as an entry point for the application. This directive can be shown anywhere in the program.

.locals

The .locals directive declares the local variables that are available by name. Here, we are defining two integer-type local variables in the MathFun.il as in the following.

.locals init ([0] int32 x,[1] int32 y)

We are assigning a string slot by also passing a string data into the class constructor as in the following.

.locals init ([0] class MathFun obj)`


MSIL Instructions

Each MSIL instruction is assigned an opcode that is commonly 1 or 2 bytes. The opcode that caters to an alternative means of identifying MSIL instructions is used primarily when producing code dynamically at run time.

  1. IL_0000: nop
  2. IL_0001: list "Ajay"
  3. IL_0006: newobj instance void MathFun::.ctor(string)
  4. IL_000b: stloc.0
  5. IL_000c: ldloc.0
  6. IL_000d: callvirt instance string MathFun::Display()
  7. IL_0012: call void [mscorlib]System.Console::WriteLine(string)
  8. IL_0017: nop
  9. IL_0018: ldstr "Addition is: {0}"
  10. IL_001d: ldloc.0
  11. IL_001e: ldc.i4.s 15
  12. IL_0020: ldc.i4.s 35
  13. IL_0022: callvirt instance int32 MathFun::Addition(int32,
  14. int32)
  15. IL_0027: box [mscorlib]System.Int32
  16. IL_002c: call void [mscorlib]System.Console::WriteLine(string,
  17. object)
  18. IL_0031: nop
  19. IL_0032: call valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
  20. IL_0037: pop
  21. IL_0038: ret

Synopsis

This article has briefly touched on the most important features of the common language runtime and ILAsm. You now know how the runtime functions, how a program in ILAsm is written and compiled using either the ilasm or Xamarin Studio and how to define the basic components (classes, fields, property, and methods). We will pick an opcode specification in depth along with the remaining crucial segments of the MSIL grammar in the next articles of this series.