August 08, 2007

dotNET - Under the Hood, IL Assembly, MSIL, ILGenerator, MethodBuilder

IL (Intermediate Language) is also referred to as MSIL (Microsoft Intermediate Language) or even ILAsm which is the same thing. IL is the code which your source code (C#, VB.NET etc) is compiled into. When you EXE or DLL is run these IL code is converted to a machine language particular to the present machine i.e. if the current machine being used to run the application is a Windows 2000 machine the IL will be converted into Windows instructions before running on the x86 processor (processor architecture used for Windows machines).

When you're developing with .NET languages with a Development tool such as Visual Studio 2005 the compiler you use will generate the correct/valid IL for you, but what if you want to create your own IL for some reason! One reason could be that you've developed your own Language and would like it to run on .NET, in this case you'll need to write out some sort of Assembly, be it EXE or DLL, with IL code. To do this you can use some Classes available in .NET API, these classes reside within the
System.Reflection.Emit namespace. I'll explain how this is achieved later.

There are also some tools which are used to compile and decompile the IL code, ILASM (all caps means this is the compiler and not the language ILAsm) is the IL compiler and ILDASM is the IL Decompiler (allows the reverse of compilation, you can create .NET language source code from an IL assembly).

An IL assembly contains 2 things, Metadata and Managed Code. Metadata is information which describes the structures and methods within the assembly. The Managed Code is the actual IL code, it is stored in the assembly in Binary form, managed means that the Runtime controls it.

The assembly has 2 main components, the metadata and the code. At runtime the assembly is loaded, the metadata is read first to find the descriptions of the structures and methods, the JIT compiles the IL code in the assembly into machine code using the metadata. When a method is required the machine code for that method is executed (incidentally this is what differentiates this from an Interpreter).

PE and COFF are additional data embedded in the assembly to describe an EXE, we'll not worry about them here.

Contents of the Assembly:
The assembly contains one or more Modules. An Assembly may or may not have multiple modules but it must have at least one i.e. the Prime Module. The Prime Module contains a
Metadata section which describes the contents of the Assembly, An Assembly Identity section and maybe some actual IL code. The Assembly may also contain additional Modules which each have their own Metadata and IL Code sections.


Boxed/Boxing and UnBoxed/UnBoxing:
You'll see these two terms appearing whenever you deal with IL, for you C++ developers you may have heard of the term before. It's related to Reference types and Value types, reference types are objects on the Heap to which a variable points, value types are not on the Heap and the variable contains the value in it's own memory location. This can be seen in the IL code. Reference types are more expensive, in terms of processor and memory, than Value types.

Boxing is the conversion of a value type to a reference type at compile time i.e. the value is contained in a variables memory location, for some reason (such as passing the value by reference, maybe as a method parameter) you want to convert your value type to a reference type. This means that a bit copy is made of the value type and instance of a Class is created with that copied value.

UnBoxing in the conversion of reference type to value type at compile time i.e. in your source code a variable points to a memory address on the Heap, you new-up an Object, when you compile the object is Boxed meaning the IL code is written for that Object, the memory required is allocated in IL code.
Int32 unBoxed = 20;//Unboxed
Object boxed = unBoxed;//Boxed
Int32 unBoxedBoxed = (Int32)boxed;

Here's the IL code generated for the C# source code above
.entrypoint
// Code size 19 (0x13)
.maxstack 1
.locals init ([0] int32 unBoxed,
[1] object boxed,
[2] int32 unBoxedBoxed)
IL_0000: nop
IL_0001: ldc.i4.s 20
IL_0003: stloc.0
IL_0004: ldloc.0
IL_0005: box [mscorlib]System.Int32
IL_000a: stloc.1
IL_000b: ldloc.1
IL_000c: unbox.any [mscorlib]System.Int32
IL_0011: stloc.2
IL_0012: ret
Here's a great article on Boxing from msdn magazine.


When playing around with IL code and reflection you may want to invoke a method within an assembly. If you are writing the assembly you may want to write it using IL code, to do this you would use the MethodInfo class found in the System.Reflection.Emit namespace. Once you'd created your methodinfo you can write IL code directly to that method and then save all to a dll later using the ILGenerator class, this ILGenerator can be thought of as the writer of the IL code and a reference to this is gotten from the MethodInfo instance you've just created
// create a dynamic assembly and module
AssemblyName assemblyName = new AssemblyName();
assemblyName.Name = "HelloWorld";
AssemblyBuilder assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(
assemblyName, AssemblyBuilderAccess.RunAndSave);

ModuleBuilder module;
module = assemblyBuilder.DefineDynamicModule("HelloWorld.dll");

// create a new type to hold our Main method
TypeBuilder typeBuilder = module.DefineType(
typeof(Example).Name,
TypeAttributes.Public TypeAttributes.Class);




// create the Main(string[] args) method
MethodBuilder methodbuilder = typeBuilder.DefineMethod(
DynamicMethod.GetCurrentMethod().Name,
DynamicMethod.GetCurrentMethod().Attributes,
DynamicMethod.GetCurrentMethod().CallingConvention,
changeID.ReturnType,
new Type[] { typeof(Example), typeof(int) });

Type t = changeID.DeclaringType;
// generate the IL for the Main method
ILGenerator ilGenerator = methodbuilder.GetILGenerator();


// Push the current value of the id field onto the
// evaluation stack. It's an instance field, so load the
// instance of Example before accessing the field.
ilGenerator.Emit(OpCodes.Ldarg_0);
ilGenerator.Emit(OpCodes.Ldfld, fid);

// Load the instance of Example again, load the new value
// of id, and store the new field value.
ilGenerator.Emit(OpCodes.Ldarg_0);
ilGenerator.Emit(OpCodes.Ldarg_1);
ilGenerator.Emit(OpCodes.Stfld, fid);


// The original value of the id field is now the only
// thing on the stack, so return from the call.
ilGenerator.Emit(OpCodes.Ret);

// bake it
Type helloWorldType = typeBuilder.CreateType();
assemblyBuilder.Save("HelloWorld.dll");


A useful reference for deeper understanding of how .NET works under the hood is the Microsoft opensource .NET project "Shared Source Common Language Infrastructure" codenamed "Rotor", it's an opensource version of .NET and can be used for discovering more about .NET code, there's also a book and it's available from googledocs, see my bookmarks.

2 comments:

Dmitry Pavlov said...

Hi Niall,

Thank you for the interesting post!

You or your readers might be interested to read about the project named 'Visual IL' created by Craig Skibo.

There are two related posts:
- Visual IL
- Visual IL source code now available

learnerplates said...

Dmitry,
Nice one, looks promising and I'd love to give it a go.
There's a problem with the download of the project from gotdotnet right now, hopefully Craig will sort out an alternative.
Good to see your keeping an eye on me :).