What's in a Cast?

Tuesday 20 January, 2004, 08:53 PM

I really like C#. I wanted to say that up front because this blog entry is about a feature of C# I don't like much, and I'd hate you to get the wrong idea... And if you're afflicted by the all too common disorder of disliking anything produced by Microsoft regardless of merit, then I'd better point out that Java has a closely related inconsistency, and that the C# and Java manifestations of this issue are but shadows of the equivalent problem that afflicts standard C++. (And yes, I know about the new(ish) cast operators in C++ like static_cast and so on, and they only offer a partial solution.)

Also, my apologies to anyone subscribed both to this blog and also to the DevelopMentor DOTNET-WEB mailing list, where I just posted something very similar. It occurred to me after I posted there that this topic seems to come up pretty often in various places. I reckoned that if I blog about it, next time it comes up all I will need to write is the URL of this entry!

The C# language provides a cast operator which, superficially, seems pretty straightforward. You use it when you have a variable of one kind, and you'd like to store its value in a variable of a different kind. You don't always need it - sometimes the C# compiler will happily let you perform certain kinds of assignments without a cast. The rule it uses is, roughly speaking: if the assignment is safe, no cast is required. For example, it's always safe to assign the value of an int (32-bit signed integer) into a variable of type long (64-bit signed integer) because there can never be any loss of information. (A long can accurately represent any value that an int can hold.)

Another example of a 'safe' conversion is where you have a reference to an object and assign that into a variable of 'compatible' type. Compatible types are any interfaces the source type implements, its base class, and any types its base class is compatible with. For example, everything derives (directly or indirectly) from object, so it's always safe to assign something into a variable of type object. Such assignments require no cast.

These two examples are really quite different. The first involves a change in representation - a conversion from int to long. The compiler has to generate a special IL instruction to do this: conv.i8. The second example required no such conversion. Consider the following code:

Foo fooRef = new Foo()
object objRef = fooRef;

All I'm doing is changing the type of reference I'm using to refer to a particular object - I have a reference of type Foo and I am assigning it into a variable of type object. The underlying representation does not change - the two variables refer to the exact same object. (And as it happens, in the current CLR implementation, the references look identical too - they are both pointers pointing to the same address.) The C# compiler doesn't have to generate any code to convert the reference here - it just stores a copy without conversion (typically using the same stloc instruction it would have used if the two variables were of the same type).

So it seems that assignment can mean two different things. It might mean "make an exact copy", or it might mean "convert to a different representation". The same ambiguity applies to casts.

Casts are used for assignments which are not necessarily safe. For example, if I have a long value, and wish to store it in a variable of type int, the compiler will not allow me to do so with a simple assignment, because information may be lost. I have to opt in to the lossy conversion by casting the value:

int i = (int) someLongValue;

This causes the compiler to generate another conversion instruction. (conv.i4)

Alternatively, I may have a variable of some reference type, and I would like to store that reference in a variable of some more specific type, such as a class derived from the existing variable's type. For example, if I have a variable of type object, I may have good reason to believe that the variable refers to an object which is really of type Foo. I have to use the cast operator to declare this belief in order to assign the reference into a variable of the appropriate type:

Foo f = (Foo) someVariableOfTypeObject;

This assignment is not implicitly safe - it's possible that I'm wrong, and that the variable in fact contains a reference to some unrelated type, Bar. To store a reference to an object in an incompatible variable is against the .NET type safety rules. The C# compiler therefore keeps me honest - before performing the assignment, it generrates a castclass opcode that checks that the objects really is compatible with Foo. If this fails, it will throw an exception, preventing the assignment from occurring, and maintaining the type safety of the system. (The CLR verifies code when it loads it and will check that the necessary castclass operations are present whenever such assignments are performed.)

In other words, this cast has caused a runtime type check to be performed. That's a very different kind of a thing from the previous cast, which performed a numeric conversion. In fact there are several different things that the cast operator might mean. Depending on the context, it could mean any of the following:

Nothing. You are allowed to write a redundant cast. (E.g. you can explicitly cast an int to an int.) The C# compiler won't complain - it just ignores the cast.
Runtime type check. When you cast a reference type to another reference type, if the compiler cannot determine at compile time that the cast will always succeed (i.e. that destination variable's type is compatible with the source variable's type) this will generate a runtime type check (the IL castclass opcode).
Specify scope. Sometimes it is necessary to specify the scope of a member as well as its name. For example, a class that implements a particular interface may choose not to place the methods for that interface in the class's public API - it can instead use explicit interface implementation. The methods of the interface are then usually only available when you access the object through a variable whose type is that interface. However, you can also use this syntax:
```
((IDisposable) someObj).Dispose();
```
The meaning of this case depends on the type of someObj. If the type of someObj is compatible with IDisposable, the compiler will not generate a runtime type check (i.e. it will not generate the IL castclass instruction) because the check is unnecessary. It will just generate the appropriate IL to call the method. (In IL, method calls are always scoped with the defining type. In this case it would appear as callvirt instance void IDisposable::Dispose().) If the compiler cannot be sure that the object is compatible with IDisposable, then this syntax causes a runtime type check - it then becomes equivalent to the previous item in this list.
Numeric conversion. This occurs when casting from one numeric type to another.
Unbox. This occurs when casting a variable of type object (or of any interface type) to a value type (such as int). A cast in this context will cause the compiler to generate an unbox opcode.
Invoke a conversion operator. Classes may define custom conversion operators. These can be invoked using the cast syntax. (Classes may also mark these conversions as implicit, meaning that even a simple assignment can invoke the conversion function!)

This wide array of context-sensitive behaviours can lead to some confusing results. For example this compiles and runs just fine:

long longVal = 42L;
int intVal = (int) longVal;

but this almost identical code does not work - it compiles, but fails at runtime with an InvalidCastException:

object longVal = 42L;
int intVal = (int) longVal;

This tends to confuse beginners. They look at the second line and believe that they have asked the compiler to change longVal into an int, and do not understand why it doesn't work even though the first example does. The reason is that these two examples invoke different cast behaviours. The first invokes the numeric conversion behaviour, while the second invokes the unbox behaviour. Unboxing will never perform numeric conversions - if you attempt to unbox to the wrong type, you simply get an InvalidCastException. The 'right' thing to do here looks even more odd:

object longVal = 42L;
int intVal = (int) ((long) longVal);

Here the two casts are operating in different modes. The (long) cast is performing an unbox, and the (int) cast is performing a numeric conversion on the resulting unboxed int.

So why on earth does C# pack so many different meanings into a single operator? Mainly because that's what C++ does. (As does Java.) It's much worse in C++ because there are yet more things that a cast might mean. In fact C++ introduced four new keywords into the language to replace the old-style cast. (And even one of those can mean different things in different contexts! static_cast can be used for disambiguation, unsafe downcasts (without a runtime type check) and numeric conversions. The fact that these new keywords were deemed necessary, and that the old-style cast operator is discouraged in C++ makes it all the more odd that C# chose to stick with the old C-style cast syntax and overload its meanings so heavily.

*sigh*

But apart from that I really like C#. :-)

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites

What's in a Cast?