(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
I've recently been in a discussion
with Ondra Spilka over on the DOTNET-CLR list. The discussion was about
a subtle issue with ArrayList
and value types - while you can put values into an ArrayList
you cannot
modify them in situ, because the indexer always returns a boxed copy of the value. If you want to change the value, you must replace
it entirely. (This is only an issue for values with several fields. For singular values like integers or doubles, there is no distinction
between modifying the value and overwriting it. This distinction only matters if you want to do something like change the width of a
rectangle struct, but not change anything else in it.)
Initially I thought to myself "this will be better once we have generics though, right?" Wrong!
It's fairly widely understood now that CLR generics are not
meant to be equivalent to C++ templates. They are designed to be somewhat simpler. But I was under the impression that one of the most
important use cases was support of strongly-typed collection classes. This means that if I want a dynamically-resizeable array, I don't
have to use ArrayList
, which treats everything as objects. I can use the generic List<T>
class,
avoiding all casting, and hence all boxing.
So this solves the problem right? I said the reason you can't modify a value in an ArrayList
in situ is that its indexer
returns you a boxed copy of the value. Since we're no longer getting a box, there's no need to unbox before we can do stuff to the
value. Except it turns out that the box is a red herring here - underlying problem is not that boxing occurs, but that copying occurs.
And even though generic containers eliminate boxing, they don't eliminate the coping.
Consider this pre-generics example, using the Point
value type defined in System.Drawing:
ArrayList al = new ArrayList(); Point p = new Point(10, 10); al.Add(p); al[0].X = 42;
This doesn't work of course - the compiler complains about the last line. The ArrayList
indexer's return type is
object
, so the C# compiler doesn't know what we mean by X
here. So we could try casting it:
((Point) al[0]).X = 42;
But the compiler still complains. And if you're using C# v2.0, you get a reasonably helpful error:
error CS0445: Cannot modify the result of an unboxing conversion
(C# v1.1 comes out with something slightly more cryptic, but it's the same basic issue.) What the compiler is pointing out to
us is that by casting to Point
, we are asking it to perform an unboxing operation. Unboxing always
involves copying. So this is more or less equivalent to having done this:
Point localTempCopy = (Point) al[0]; localTempCopy.X = 42;
So we're asking to unbox into a local temporary copy, and we're modifying that copy's X property rather than the X property
of the boxed version. So this won't affect what's in the ArrayList
- we're only going to be modifying our
local copy. The compiler no longer complains by the way - now we've given it a named variable into which to place
the unboxed copy, it thinks we know what we're doing. It only complains in the previous case because it guesses that you probably
haven't realised that you'll be modifying a property of a local temporary variable that is about to be discarded.
Of course just because it compiles, that doesn't make it useful... If we print out the value in the ArrayList
,
it's clear we've not actually modified it - this code:
Point localTempCopy = (Point) al[0]; localTempCopy.X = 42; Console.WriteLine(al[0]);
prints out this:
{X=10,Y=10}
If the compiler had let us compile the previous more compact (and less obviously useless) example, the effect would have been the same. So getting a compiler error here is a Good Thing.
Now if we just use an array, we don't get any of these problems:
Point[] ar = new Point[1]; ar[0] = p = new Point(10, 10); ar[0].X = 42; Console.WriteLine(ar[0]);
This prints out:
{X=42,Y=10}
which is much more like what we were probably hoping for. But the problem is we've now lost the resizable goodness
of the ArrayList
because we're using the low level built in array type.
The generic List<T>
class is supposed to solve this kind of problem - it offers us the same freedom from boxing we
get with the basic array class, along with all the dynamic resizability that we enjoy in ArrayList
! But watch what
happens:
List<Point> pl = new List<Point>(); pl.Add(new Point(10, 10)); pl[0].X = 42; // <-- compiler error occurs here! Console.WriteLine(pl[0]);
This doesn't compile. We get this error:
error CS1612: Cannot modify the return value of 'System.Collections.Generic.List<System.Drawing.Point>.this[int]' because it is not a variable
This makes sense because what we're really asking the compiler to do is this:
Point localTempCopy = pl[0]; localTempCopy.X = 42;
Look familiar? That's almost exactly the same as the expanded version we used to get rid of the error in the ArrayList
example. And that's because it's the exact same problem. So it turns out the problem has nothing to do with boxing after all then...
The only version of the code that did what we want - to modify properties of an object in an array in situ - was the basic array version. If you examine the IL for that, it looks like this:
ldloc.0 ldc.i4.0 ldelema [System.Drawing]System.Drawing.Point ldc.i4.s 42 call instance void [System.Drawing]System.Drawing.Point::set_X(int32)
The important line is that ldelema
. The 'a' on the end is significant - it is short for 'address'. This signifies that
ldelema
is not retrieving a copy of what's in the array, it's returning a managed pointer to the item in the array.
Compare this to what happens if we force it to make an explicit copy like was happening with the other examples:
Point localTempCopy = ar[0]; localTempCopy.X = 42;
Here's the resulting IL:
ldloc.0 ldc.i4.0 ldelema [System.Drawing]System.Drawing.Point ldobj [System.Drawing]System.Drawing.Point stloc.1 ldloca.s localTempCopy ldc.i4.s 42 call instance void [System.Drawing]System.Drawing.Point::set_X(int32)
It uses ldelema
as before, because that's the only valid way of retrieving an element from a value type
array. But then it uses ldobj
- this makes a local copy of the value. And notice that when it comes to set the
X
property, it uses ldloca.s
to get a managed pointer to the local copy on the stack - member
functions of values always require their this
variable to be a managed pointer.
So that's the secret with arrays. Their equivalent to the indexer, the ldelema
instruction, returns a managed
pointer to the relevant value, whereas the collection classes return a copy.
This problem isn't unique to collections. Most people who have done any Windows Forms development have already seen this problem in a different guise. Anyone who has tried to do this:
someControl.Size.Width = 42;
has run into what is essentially the same issue. The Control.Size
property is of type Size
, which
is a value type. And the property accessor returns a copy of the size. This means that the code above is attempting
to modify the Width
property on a local temporary copy rather than the underlying size. It's the same problem that
we're seeing here with collections, so again, the compiler doesn't allow it.
In short, the C# compiler only lets you do anything to a value if that value is stored either in a local variable, or in an array.
This seems like another disappointing moment for CLR generics. I'd already got over the fact that I can't do all the wild and whacky stuff that used to be possible with C++ templates. I'd even come around to the idea that this was probably, on balance, a good thing. But this seems like a pretty serious problem, as it diminishes the usefulness of what is supposed to be one of the important use cases for CLR generics.
The C++ standard library's vector
class is the nearest equivalent to .NET v2.0's generic List<T>
-
they are both essentially dynamically resizeable arrays. But if you look at the definition of vector
's []
operator, you'll see that it returns a reference to the item, rather than a copy. In other words, the STL's dynamic array
behaves in much the same way as the intrinsic .NET array. (In C++ terminology, both the built-in array and the vector
's
indexing operator return an lvalue.)
Maybe I've missed something - maybe there is some way of returning a managed pointer from an indexer in C# without resorting to unsafe code. But if there were, wouldn't the generic collection classes in the class libraries be doing just that?