What can you do in MSIL that you cannot do in C# or VB.NET?


Answers

Most .Net languages including C# and VB do not use the tail recursion feature of MSIL code.

Tail recursion is an optimization that is common in functional languages. It occurs when a method A ends by returning the value of method B such that method A's stack can be deallocated once the call to method B is made.

MSIL code supports tail recursion explicitly, and for some algorithms this could be a important optimization to make. But since C# and VB do not generate the instructions to do this, it must be done manually (or using F# or some other language).

Here is an example of how tail-recursion may be implemented manually in C#:

private static int RecursiveMethod(int myParameter)
{
    // Body of recursive method
    if (BaseCase(details))
        return result;
    // ...

    return RecursiveMethod(modifiedParameter);
}

// Is transformed into:

private static int RecursiveMethod(int myParameter)
{
    while (true)
    {
        // Body of recursive method
        if (BaseCase(details))
            return result;
        // ...

        myParameter = modifiedParameter;
    }
}

It is common practice to remove recursion by moving the local data from the hardware stack onto a heap-allocated stack data structure. In the tail-call recursion elimination as shown above, the stack is eliminated completely, which is a pretty good optimization. Also, the return value does not have to walk up a long call-chain, but it is returned directly.

But, anyway, the CIL provides this feature as part of the language, but with C# or VB it has to be implemented manually. (The jitter is also free to make this optimization on its own, but that is a whole other issue.)

Question

All code written in .NET languages compiles to MSIL, but are there specific tasks / operations that you can do only using MSIL directly?

Let us also have things done easier in MSIL than C#, VB.NET, F#, j# or any other .NET language.

So far we have this:

  1. Tail recursion
  2. Generic Co/Contravariance
  3. Overloads which differ only in return types
  4. Override access modifiers
  5. Have a class which cannot inherit from System.Object
  6. Filtered exceptions (can be done in vb.net)
  7. Calling a virtual method of the current static class type.
  8. Get a handle on the boxed version of a value type.
  9. Do a try/fault.
  10. Usage of forbidden names.
  11. Define your own parameterless constructors for value types.
  12. Define events with a raise element.
  13. Some conversions allowed by the CLR but not by C#.
  14. Make a non main() method as the .entrypoint.
  15. work with the native int and native unsigned int types directly.
  16. Play with transient pointers
  17. emitbyte directive in MethodBodyItem
  18. Throw and catch non System.Exception types
  19. Inherit Enums (Unverified)
  20. You can treat an array of bytes as a (4x smaller) array of ints.
  21. You can have a field/method/property/event all have the same name(Unverified).
  22. You can branch back into a try block from its own catch block.
  23. You have access to the famandassem access specifier (protected internal is famorassem)
  24. Direct access to the <Module> class for defining global functions, or a module initializer.



Why does “int[] is uint[] == true” in C#

C# and the CLR have somewhat different conversion rules.

You can't directly cast between int[] and uint[] in C# because the language doesn't believe any conversion is available. However, if you go via object the result is up to the CLI. From the CLI spec section 8.7 (I hope - I'm quoting an email exchange I had on this topic with Eric Lippert a while ago):

Signed and unsigned integral primitive types can be assigned to each other; e.g., int8 := uint8 is valid. For this purpose, bool shall be considered compatible with uint8 and vice versa, which makes bool := uint8 valid, and vice versa. This is also true for arrays of signed and unsigned integral primitive types of the same size; e.g., int32[] := uint32[] is valid.

(I haven't checked, but I assume that this sort of reference type conversion being valid is what makes is return true as well.)

It's somewhat unfortunate that there are disconnects between the language and the underlying execution engine, but it's pretty much unavoidable in the long run, I suspect. There are a few other cases like this, but the good news is that they rarely seem to cause significant harm.

EDIT: As Marc deleted his answer, I've linked to the full mail from Eric, as posted to the C# newsgroup.




Suggestion:

Declaring intArray as "int [] intArray" rather then "object intArray" will allow the compiler to pick up the invalid C# cast. Unless you absolutely have to use object, I would take that approach.

Re Q2,Q3:

At runtime have you tried wrapping the cast in a checked block?

From this article at MSDN:

By default, an expression that contains only constant values causes a compiler error if the expression produces a value that is outside the range of the destination type. If the expression contains one or more non-constant values, the compiler does not detect the overflow.

...

By default, these non-constant expressions are not checked for overflow at run time either, and they do not raise overflow exceptions. The previous example displays -2,147,483,639 as the sum of two positive integers.

Overflow checking can be enabled by compiler options, environment configuration, or use of the checked keyword.

As it says, you can enforce overflow checking more globally via a compiler setting or environment config.

In your case this is probably desirable as it will cause a runtime error to be thrown that will ensure the likely invalid unsigned number to signed number overflow will not occur silently.

[Update] After testing this code, I found that using a declaration of type object instead of int [] appears to bypass the standard C# casting sytax, regardless of whether checked is enabled or not.

As JS has said, when you use object, you are bound by CLI rules and these apparently allow this to occur.

Re Q1:

This is related to the above. In short, because the cast involved it does not throw an exception (based on current overflow setting). Whether this is a good idea is another question.

From MSDN:

An "is" expression evaluates to true if the provided expression is non-null, and the provided object can be cast to the provided type without causing an exception to be thrown.




This is supported at a very low level within the runtime, a value of a value type can be embedded in an object that's stored on the garbage collected heap. Thus turning a value type in a reference type and creating the illusion that every value type inherits from ValueType and Object. Which are reference types.

The mechanism is calling boxing in .NET, the value type value literally gets "boxed" into an object. And there's an unboxing conversion to go back from the object with the boxed value to a value of a value type. The C# compiler will emit these conversions automatically based on your source code. There are dedicated opcodes for this is IL, the Intermediate Language that the C# compiler emits from your source code. Respectively the Opcodes.Box and Opcodes.Unbox instructions. Opcodes.Constrained is an instruction that can optimize the conversion. The jitter knows how to implement them and generates very efficient inline machine code to make these conversions.

Boxing is a highly specific to System.Object being a base class in the type hierarchy and the plumbing that supports it is highly specific to value type values. It is not an extensible mechanism, you cannot add your own IL instructions nor extend the jitter nor give the C# language new syntax. If you need your types to have a common base interface or class then you have to declare them that way in your code. The dynamic keyword may be attractive to you, it isn't clear from the question.




What is the earliest entrypoint that the CLR calls before calling any method in an assembly?

I'd normally not answer my own question, but meanwhile I did find an answer that hasn't come up here before, so here I go.

After some research, I happened on this post by Microsoft, which explains the problems of mixing managed and unmanaged code inside DllMain and the solution, which came about with the 2nd version of the CLI, module initializers. Quote:

This initializer runs just after the native DllMain (in other words, outside of loader lock) but before any managed code is run or managed data is accessed from that module. The semantics of the module .cctor are very similar to those of class .cctors and are defined in the ECMA C# and Common Language Infrastructure Standards.

While I wasn't able to find the term module initializer inside the current ECMA specification, it follows logically from type initializer and the global <Module> special class (see section 22.26 on MethodDef, sub-point 40). This feature was implemented after .NET 1.1 (i.e., from 2.0 onwards). See also this semi-official description.

This question wasn't about C#, but because it is the lingua franca of .NET: C# doesn't know global methods, and you can't create a <Module>, let alone its cctor. However, Einar Egilsson has recognized this apparent deficiency and created InjectModuleInitializer.exe that allows you to do this as a post/compile step from Visual Studio. In C++.NET, using this method is trivial and recommended practice in place of DllMain. See also this SO answer by Ben Voigt (not the accepted answer) and this SO answer by yoyoyoyosef.

In short, the module initializer is the first method that is called after loading the module (not necessarily when loading assembly!) and before calling any class or instance method. It takes no parameters, returns no value, but can contain any managed code in its body.




This is by design: it minimises coupling between static constructors. You know that your cctor will be invoked before anything in your class initializes, and after the cctors of any classes used by your class. But there's no guarantee on when it will run compared to unrelated classes in the same application.

If you want to make sure some code of yours runs before the entry point, consider writing a wrapper for the main application. A straightforward way would be to put it in a separate executable.

A more self-contained way to do this might be to:

  1. Run whatever startup code is needed, in the right order. Don't reference any types in assemblies that shouldn't get initialized.
  2. Create your own app domain
  3. Run the real entry point within this second app domain




Tags