Generics, Latent Types, and Type Safety

Sunday 14 March, 2004, 06:33 PM

I've just read Bruce Eckel's blog on the limitations of generics in Java and C#, and his followup providing more information about what he means by latent typing. The article focuses on Java, but almost everything he says applies to C# too.

While Bruce is correct that C# and Java generics have a specific limitation, I don't think his explanation of the reason for the limitation is quite right.

(Note that Bruce doesn't explicitly talk about the fact that there are two levels on which latent typing can operate in .NET. There's latent typing of variables, which is the main one that generics enable. But we already have another form of latent typing: if a variable of type object is non-null, then the thing it refers to has a latent type. And VB.NET already supports the classic SmallTalk/Objective C++/whatever style of message passing where you invoke a method on an object without knowing its real type. VB.NET implements this using reflection. This distinction is a little confusing because Bruce gives examples which look a lot like this latter kind of message passing style of latent typing, although I'm not familiar enough with Python to be absolutely sure. But the main thrust of his articles is to talk about latent typing of the variables themselves, as opposed to the types of the objects to which they are referring. If you want an independent 3rd party definition of a latent type, here's one from Scheme's point of view, and here's another one on a well-known Wiki.)

The problem Bruce is complaining about is that you cannot write a generic function like this:

public static void Frob<T>(T x)
{
    Console.WriteLine(x.Bar());
}

This is potentially frustrating because the correctness or otherwise of this code is surely determined by the type used at instantiation time. For example, consider this type:

class Foo
{
    public int Bar() { return 42; }
}

It looks like it should be valid to instantiate Frob as Frob<Foo>, because the Foo class provides the Bar method that Frob requires. However, if you try it, you'll find that you never get this far - the C# compiler refuses to compile the generic Frob method. It complains that it doesn't know what the Bar method is.

However, Bruce's explanation for why this is so is incorrect. He says that it is because C# doesn't support latent typing. Latent typing is where a particular expression or variable's type may not be self-evident from the source code. C++ templates support this - the type of a variable in a template function may well be determined only when the template is instantiated. Bruce claims that Java and C# generics don't support this. He says that in the generic Frob function above, the variable x has a fixed type, System.Object in the C# case, which is why the compiler complains when you try to invoke any method other than the standard System.Object methods. He also talks about the constraint mechanisms that let you indicate that some generic code requires the type parameter to be a class that implement a particular interface or derive from a particular base class, and claims that the type of the variable is then effectively one of these types.

But he is wrong. In the generic Frob function above, x does have a latent type which, just like in C++, is determined by the type parameter supplied during instantiation. This can easily be demonstrated with the following code:

public static void Spong<T>(T x)
{
    Type t = typeof(T);
    Console.WriteLine(t.FullName);
}

The typeof keyword returns the type information for the named type. In this case, the name of the type being requested is T, which is the type parameter for this generic function. It is the type of x, and it is a latent type: its type is not knowable from this source code alone, and is determined by the particular instance of the generic function. In fact if you look at the IL, you will see:

ldtoken !!0

This means "return the metadata token for whatever the first type parameter is in this instantiation of this generic code." And in general in IL, you can use !!N wherever a type name would normally be required. This syntax refers to the Nth instantiation parameter.

This is not the same thing as calling GetType on x. The typeof operator really is picking up the latent type of the x variable, which is not necessarily what would be returned by x.GetType(). For example, this code:

Spong<Foo>(null);
Spong<System.String>(null);
Spong<System.Object>(null);
Spong<System.Object>("Hello, world");

Prints out this:

Foo
System.String
System.Object
System.Object

In each case, calling x.GetType() would have had a different effect. In the first three cases, x is null, so x.GetType() would have thrown an exception. This demonstrates clearly that this is not simply a case of extracting the type of an object at runtime as we can do in today's pre-generics world. The final example is also interesting - here, x.GetType() would have returned the Type object for System.String. However, we've explicitly chosen the Spong<System.Object> instantiation here. Again this illustrates that the typeof operator is retreving the latent type of the x variable, rather than the runtime type of what x currently refers to.

So why doesn't the original Frob function compile? It's because the C# compiler team took an explicit decision not to support that kind of thing. It was an explicit goal that the C# compiler report at compile time any attempt to invoke missing methods from within generic code.

Even now, I can hear Bruce pointing out that C++ manages to offer this facility and still perform compile time type checking. He makes this point in both his blog entries.

This makes me wonder if the real Bruce Eckel has been abducted.

I can't believe Bruce doesn't understand enough about the differences between the C++ compilation model and the model used by C# and Java to understand why this argument doesn't hold water. The reason C++ is able to check such a generic function at build time is that it knows about every place this template is instantiated. C++ uses a monolithic compliation model. At build time, the compiler sees every single instantiation of a given template, and can therefore know if they are all valid. The same is not true with C# or Java. These both use a runtime linking model.

You might think that this shouldn't be a problem. On the face of it, it seems like it should be possible for the compiler to make checks when you write code that uses some generic code. But remember that in a dynamically linked world (and don't confuse dynamic linking with dynamic typing by the way) it's possible for the provider and consumer to evolve indendently. This means that generics have similar problems as inheritance: you can get a whole class of errors at runtime that would only have been possible at compile time in a monolithically linked world.

As an example, suppose that v1.0 of Frob didn't attempt to call Bar on x. This would then allow some other component to instantiate, say, Frob<int>. But if v1.1 of Frob looks like the earlier example, such an instantiation will obviously fail. With C++, it will fail when you compile, but in a non-monolithic world, there isn't necessarily any compilation step when upgrading from v1.0 to v1.1 - client code can very often end up running against newer versions of components than those they were originally compiled against, at which point a runtime error is the only way of dealing with this.

It seems that avoiding these runtime errors was an explicit goal of the C# generics design. (And presumably also for Java, although I'm not very familiar with Java generics.) The only way to catch these kinds of problems at compile time is to impose much heavier restrictions at compile time. Generic code is required to declare what it expects of its type parameters up front as part of its public interface. This means that these kinds of problems can be identified at compile time even in a runtime-linked world. (Unless you change your explicitly declared constraints of course... But changing the public API of a component is more obviously asking for trouble than changing the implementation is.)

Having said all that, I'm still with Bruce on this, in that I wish I had the facility to disable this. While it's often a Good Thing for constraints to be explicit rather than implied, it's sometimes useful to relax this. So I want some kind of anti-constraint syntax, such as:

public static void Frob<T> (T x) where T couldbeanything

This hypothetical construct would disable all compile time checking, letting me opt into potential runtime errors in exchange for greater flexibility. So this leads to the obvious question of whether the absence of such a facility is a C# feature, or an underlying CLR limitation. In other words, if I can't write my original Frob function in C#, can I write it in IL? Let's try:

.method public hidebysig static void  Frob<T>(!!0 x) cil managed
{
    .maxstack  8
    ldarg.0
    box        !!0
    callvirt   instance int32 !!0::Bar()
    call       void [mscorlib]System.Console::WriteLine(int32)
    ret
}

I'd call that the 'obvious' approach. Here I am exploiting the !!N syntax I described earlier, but rather than using it to retrieve a type token, I'm using it to invoke a method. That callvirt method is asking to invoke a method called Bar which takes no parameters and which returns a 32-bit integer, and is defined by whatever class we've been instantiated with.

Not only does this express precisely what I was trying (and failing) to do with the first C# snippet, this actually compiles. Sadly, that's where the good news ends. This fails verification, and even if you run in a security context which permits unverifable code, it produces a surprising runtime error:

System.MissingMethodException: Method not found: 'Int32 System.Object.Bar()'.

So it seems like Bruce was quite close to the truth after all. Although I've been able to demonstrate that variables can have a latent type, and that this type is available to the typeof operator, it turns out not to be available for method invocations. It really does look like that for the purposes of calling methods, the latent type effectively vanishes, and it behaves as though the type of x is System.Object, just like Bruce said.

This seems both inconsistent and unnecessary, which makes me wonder if it's a bug. It's inconsistent, because the effective type of the variable appears to depend on how you use it. And it's unnecessary because the error occurred at generic instantiation time, at which point the CLR had all the information it needed. I was trying to instantiate Frob<Foo> at the point at which the error was thrown. This means it knew that the type of x was Foo in this case, and that the required Bar method was therefore present, so I see no good reason for it to have thrown an exception here.

Of course we can always invoke the method reflectively to avoid this problem. This is particularly easy if you use VB.NET. But generics should enable you to avoid the overheads that involves. (And also to discover any problems at least at template instantiation time, rather than at method invocation time.)

By the way, I think Bruce is being overly bleak - even with these constraints, it's not true that generics "only solve the problem of automatically casting in and out of containers." In C# they can eliminate the cast altogether, which can have a significant performance implication: you can avoid boxing entirely for collections of value types. It is also useful for non-collection-based scenarios where casting was previously required. (E.g. methods such as RemotingServices.Connect which currently take a Type object as a parameter, and return an object which must then be cast back to ths type.) And as the thing that pained me most about losing templates when I moved to C# was all this casting, I'm actually reasonably happy with the implementation of generics currently proposed for Whidbey. While I would prefer to see the extra flexibility described by Bruce, I don't regard it as a big problem - for me, the most pressing problem caused by the absence of templates will have been solved. So I won't be lobbying Microsoft to slip Whidbey to get this feature in there...

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites

Generics, Latent Types, and Type Safety