(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
I've just read Bruce Eckel's blog on the limitations of generics in Java and C#, and his followup providing more information about what he means by latent typing. The article focuses on Java, but almost everything he says applies to C# too.
While Bruce is correct that C# and Java generics have a specific limitation, I don't think his explanation of the reason for the limitation is quite right.
(Note that Bruce doesn't explicitly talk about the fact that there are two levels on which latent typing can operate in
.NET. There's latent typing of variables, which is the main one that generics enable. But we already have another form
of latent typing: if a variable of type object
is non-null, then the thing it refers to has a latent type. And
VB.NET already supports the classic SmallTalk/Objective C++/whatever style of message passing where you invoke
a method on an object without knowing its real type. VB.NET implements this using reflection. This distinction is a little
confusing because Bruce gives examples which look a lot like this latter kind of message passing style of latent
typing, although I'm not familiar enough with Python to be absolutely sure. But the main thrust of his articles is to talk
about latent typing of the variables themselves, as opposed to the types of the objects to which they are
referring. If you want an independent 3rd party definition of a latent type, here's one from Scheme's point of
view, and here's another one on a well-known Wiki.)
The problem Bruce is complaining about is that you cannot write a generic function like this:
public static void Frob<T>(T x)
{
Console.WriteLine(x.Bar());
}
This is potentially frustrating because the correctness or otherwise of this code is surely determined by the type used at instantiation time. For example, consider this type:
class Foo { public int Bar() { return 42; } }
It looks like it should be valid to instantiate Frob
as Frob<Foo>
, because the
Foo
class provides the Bar
method that Frob
requires. However, if you
try it, you'll find that you never get this far - the C# compiler refuses to compile the generic Frob
method. It complains that it doesn't know what the Bar
method is.
However, Bruce's explanation for why this is so is incorrect. He says that it is because C# doesn't support latent
typing. Latent typing is where a particular expression or variable's type may not be self-evident from the source code.
C++ templates support this - the type of a variable in a template function may well be determined only when the template
is instantiated. Bruce claims that Java and C# generics don't support this. He says that in the
generic Frob
function above, the variable x
has a fixed type,
System.Object
in the C# case, which is why the compiler complains when you try to invoke any method
other than the standard System.Object
methods. He also talks about the constraint mechanisms that let
you indicate that some generic code requires the type parameter to be a class that implement a particular interface or
derive from a particular base class, and claims that the type of the variable is then effectively one of these types.
But he is wrong. In the generic Frob
function above, x
does have a
latent type which, just like in C++, is determined by the type parameter supplied during instantiation. This can easily
be demonstrated with the following code:
public static void Spong<T>(T x) { Type t = typeof(T); Console.WriteLine(t.FullName); }
The typeof
keyword returns the type information for the named type. In this case, the name of the
type being requested is T
, which is the type parameter for this generic function. It is the type of
x
, and it is a latent type: its type is not knowable from this source code alone, and is determined by the
particular instance of the generic function. In fact if you look at the IL, you will see:
ldtoken !!0
This means "return the metadata token for whatever the first type parameter is in this instantiation of this generic
code." And in general in IL, you can use !!N
wherever a type name would normally be required.
This syntax refers to the Nth instantiation parameter.
This is not the same thing as calling GetType
on x
. The
typeof
operator really is picking up the latent type of the x
variable, which is not
necessarily what would be returned by x.GetType()
. For example, this code:
Spong<Foo>(null); Spong<System.String>(null); Spong<System.Object>(null); Spong<System.Object>("Hello, world");
Prints out this:
Foo System.String System.Object System.Object
In each case, calling x.GetType()
would have had a different effect. In the first three cases,
x
is null
, so x.GetType()
would have thrown an exception. This
demonstrates clearly that this is not simply a case of extracting the type of an object at runtime as we can do in
today's pre-generics world. The final example is also interesting - here, x.GetType()
would have
returned the Type
object for System.String
. However, we've explicitly chosen the
Spong<System.Object>
instantiation here. Again this illustrates that the typeof
operator is retreving the latent type of the x
variable, rather than the runtime type of what
x
currently refers to.
So why doesn't the original Frob
function compile? It's because the C# compiler team took an explicit
decision not to support that kind of thing. It was an explicit goal that the C# compiler report at compile time any
attempt to invoke missing methods from within generic code.
Even now, I can hear Bruce pointing out that C++ manages to offer this facility and still perform compile time type checking. He makes this point in both his blog entries.
This makes me wonder if the real Bruce Eckel has been abducted.
I can't believe Bruce doesn't understand enough about the differences between the C++ compilation model and the model used by C# and Java to understand why this argument doesn't hold water. The reason C++ is able to check such a generic function at build time is that it knows about every place this template is instantiated. C++ uses a monolithic compliation model. At build time, the compiler sees every single instantiation of a given template, and can therefore know if they are all valid. The same is not true with C# or Java. These both use a runtime linking model.
You might think that this shouldn't be a problem. On the face of it, it seems like it should be possible for the compiler to make checks when you write code that uses some generic code. But remember that in a dynamically linked world (and don't confuse dynamic linking with dynamic typing by the way) it's possible for the provider and consumer to evolve indendently. This means that generics have similar problems as inheritance: you can get a whole class of errors at runtime that would only have been possible at compile time in a monolithically linked world.
As an example, suppose that v1.0 of Frob
didn't attempt to call Bar
on
x
. This would then allow some other component to instantiate, say, Frob<int>
.
But if v1.1 of Frob
looks like the earlier example, such an instantiation will obviously fail. With C++, it
will fail when you compile, but in a non-monolithic world, there isn't necessarily any compilation step when upgrading
from v1.0 to v1.1 - client code can very often end up running against newer versions of components than those they
were originally compiled against, at which point a runtime error is the only way of dealing with this.
It seems that avoiding these runtime errors was an explicit goal of the C# generics design. (And presumably also for Java, although I'm not very familiar with Java generics.) The only way to catch these kinds of problems at compile time is to impose much heavier restrictions at compile time. Generic code is required to declare what it expects of its type parameters up front as part of its public interface. This means that these kinds of problems can be identified at compile time even in a runtime-linked world. (Unless you change your explicitly declared constraints of course... But changing the public API of a component is more obviously asking for trouble than changing the implementation is.)
Having said all that, I'm still with Bruce on this, in that I wish I had the facility to disable this. While it's often a Good Thing for constraints to be explicit rather than implied, it's sometimes useful to relax this. So I want some kind of anti-constraint syntax, such as:
public static void Frob<T> (T x) where T couldbeanything
This hypothetical construct would disable all compile time checking, letting me opt into potential runtime errors in
exchange for greater flexibility. So this leads to the obvious question of whether the absence of such a facility is a C#
feature, or an underlying CLR limitation. In other words, if I can't write my original Frob
function in C#,
can I write it in IL? Let's try:
.method public hidebysig static void Frob<T>(!!0 x) cil managed { .maxstack 8 ldarg.0 box !!0 callvirt instance int32 !!0::Bar() call void [mscorlib]System.Console::WriteLine(int32) ret }
I'd call that the 'obvious' approach. Here I am exploiting the !!N
syntax I described earlier,
but rather than using it to retrieve a type token, I'm using it to invoke a method. That callvirt
method is
asking to invoke a method called Bar
which takes no parameters and which returns a 32-bit integer,
and is defined by whatever class we've been instantiated with.
Not only does this express precisely what I was trying (and failing) to do with the first C# snippet, this actually compiles. Sadly, that's where the good news ends. This fails verification, and even if you run in a security context which permits unverifable code, it produces a surprising runtime error:
System.MissingMethodException: Method not found: 'Int32 System.Object.Bar()'.
So it seems like Bruce was quite close to the truth after all. Although I've been able to demonstrate that variables
can have a latent type, and that this type is available to the typeof
operator, it turns out not to be
available for method invocations. It really does look like that for the purposes of calling methods, the latent type
effectively vanishes, and it behaves as though the type of x
is System.Object
, just like
Bruce said.
This seems both inconsistent and unnecessary, which makes me wonder if it's a bug. It's inconsistent,
because the effective type of the variable appears to depend on how you use it. And it's unnecessary because the
error occurred at generic instantiation time, at which point the CLR had all the information it needed. I was trying to
instantiate Frob<Foo>
at the point at which the error was thrown. This means it knew that the type
of x
was Foo
in this case, and that the required Bar
method was
therefore present, so I see no good reason for it to have thrown an exception here.
Of course we can always invoke the method reflectively to avoid this problem. This is particularly easy if you use VB.NET. But generics should enable you to avoid the overheads that involves. (And also to discover any problems at least at template instantiation time, rather than at method invocation time.)
By the way, I think Bruce is being overly bleak - even with these constraints, it's not true that generics "only solve
the problem of automatically casting in and out of containers." In C# they can eliminate the cast altogether, which can
have a significant performance implication: you can avoid boxing entirely for collections of value types. It is also useful
for non-collection-based scenarios where casting was previously required. (E.g. methods such as
RemotingServices.Connect
which currently take a Type
object as a parameter, and
return an object which must then be cast back to ths type.) And as the thing that pained me most about losing templates
when I moved to C# was all this casting, I'm actually reasonably happy with the implementation of generics currently
proposed for Whidbey. While I would prefer to see the extra flexibility described by Bruce, I don't regard it as a big
problem - for me, the most pressing problem caused by the absence of templates will have been solved. So I won't
be lobbying Microsoft to slip Whidbey to get this feature in there...