CLR Generics Limitation - Modifying Values In Situ In a Container

Thursday 27 January, 2005, 11:40 AM

I've recently been in a discussion with Ondra Spilka over on the DOTNET-CLR list. The discussion was about a subtle issue with ArrayList and value types - while you can put values into an ArrayList you cannot modify them in situ, because the indexer always returns a boxed copy of the value. If you want to change the value, you must replace it entirely. (This is only an issue for values with several fields. For singular values like integers or doubles, there is no distinction between modifying the value and overwriting it. This distinction only matters if you want to do something like change the width of a rectangle struct, but not change anything else in it.)

Initially I thought to myself "this will be better once we have generics though, right?" Wrong!

It's fairly widely understood now that CLR generics are not meant to be equivalent to C++ templates. They are designed to be somewhat simpler. But I was under the impression that one of the most important use cases was support of strongly-typed collection classes. This means that if I want a dynamically-resizeable array, I don't have to use ArrayList, which treats everything as objects. I can use the generic List<T> class, avoiding all casting, and hence all boxing.

So this solves the problem right? I said the reason you can't modify a value in an ArrayList in situ is that its indexer returns you a boxed copy of the value. Since we're no longer getting a box, there's no need to unbox before we can do stuff to the value. Except it turns out that the box is a red herring here - underlying problem is not that boxing occurs, but that copying occurs. And even though generic containers eliminate boxing, they don't eliminate the coping.

Consider this pre-generics example, using the Point value type defined in System.Drawing:

ArrayList al = new ArrayList();
Point p = new Point(10, 10);
al.Add(p);

al[0].X = 42;

This doesn't work of course - the compiler complains about the last line. The ArrayList indexer's return type is object, so the C# compiler doesn't know what we mean by X here. So we could try casting it:

((Point) al[0]).X = 42;

But the compiler still complains. And if you're using C# v2.0, you get a reasonably helpful error:

error CS0445: Cannot modify the result of an unboxing conversion

(C# v1.1 comes out with something slightly more cryptic, but it's the same basic issue.) What the compiler is pointing out to us is that by casting to Point, we are asking it to perform an unboxing operation. Unboxing always involves copying. So this is more or less equivalent to having done this:

Point localTempCopy = (Point) al[0];
localTempCopy.X = 42;

So we're asking to unbox into a local temporary copy, and we're modifying that copy's X property rather than the X property of the boxed version. So this won't affect what's in the ArrayList - we're only going to be modifying our local copy. The compiler no longer complains by the way - now we've given it a named variable into which to place the unboxed copy, it thinks we know what we're doing. It only complains in the previous case because it guesses that you probably haven't realised that you'll be modifying a property of a local temporary variable that is about to be discarded.

Of course just because it compiles, that doesn't make it useful... If we print out the value in the ArrayList, it's clear we've not actually modified it - this code:

Point localTempCopy = (Point) al[0];
localTempCopy.X = 42;
Console.WriteLine(al[0]);

prints out this:

{X=10,Y=10}

If the compiler had let us compile the previous more compact (and less obviously useless) example, the effect would have been the same. So getting a compiler error here is a Good Thing.

Now if we just use an array, we don't get any of these problems:

Point[] ar = new Point[1];
ar[0] = p = new Point(10, 10);
ar[0].X = 42;
Console.WriteLine(ar[0]);

This prints out:

{X=42,Y=10}

which is much more like what we were probably hoping for. But the problem is we've now lost the resizable goodness of the ArrayList because we're using the low level built in array type.

Enter Generics

The generic List<T> class is supposed to solve this kind of problem - it offers us the same freedom from boxing we get with the basic array class, along with all the dynamic resizability that we enjoy in ArrayList! But watch what happens:

List<Point> pl = new List<Point>();
pl.Add(new Point(10, 10));
pl[0].X = 42; // <-- compiler error occurs here!
Console.WriteLine(pl[0]);

This doesn't compile. We get this error:

error CS1612: Cannot modify the return value of 'System.Collections.Generic.List<System.Drawing.Point>.this[int]' because it is not a variable

This makes sense because what we're really asking the compiler to do is this:

Point localTempCopy = pl[0];
localTempCopy.X = 42;

Look familiar? That's almost exactly the same as the expanded version we used to get rid of the error in the ArrayList example. And that's because it's the exact same problem. So it turns out the problem has nothing to do with boxing after all then...

So How Come It Works With Arrays?

The only version of the code that did what we want - to modify properties of an object in an array in situ - was the basic array version. If you examine the IL for that, it looks like this:

ldloc.0
ldc.i4.0
ldelema    [System.Drawing]System.Drawing.Point
ldc.i4.s   42
call       instance void [System.Drawing]System.Drawing.Point::set_X(int32)

The important line is that ldelema. The 'a' on the end is significant - it is short for 'address'. This signifies that ldelema is not retrieving a copy of what's in the array, it's returning a managed pointer to the item in the array.

Compare this to what happens if we force it to make an explicit copy like was happening with the other examples:

Point localTempCopy = ar[0];
localTempCopy.X = 42;

Here's the resulting IL:

ldloc.0
ldc.i4.0
ldelema    [System.Drawing]System.Drawing.Point
ldobj      [System.Drawing]System.Drawing.Point
stloc.1
ldloca.s   localTempCopy
ldc.i4.s   42
call       instance void [System.Drawing]System.Drawing.Point::set_X(int32)

It uses ldelema as before, because that's the only valid way of retrieving an element from a value type array. But then it uses ldobj - this makes a local copy of the value. And notice that when it comes to set the X property, it uses ldloca.s to get a managed pointer to the local copy on the stack - member functions of values always require their this variable to be a managed pointer.

So that's the secret with arrays. Their equivalent to the indexer, the ldelema instruction, returns a managed pointer to the relevant value, whereas the collection classes return a copy.

We've Been Here Before

This problem isn't unique to collections. Most people who have done any Windows Forms development have already seen this problem in a different guise. Anyone who has tried to do this:

someControl.Size.Width = 42;

has run into what is essentially the same issue. The Control.Size property is of type Size, which is a value type. And the property accessor returns a copy of the size. This means that the code above is attempting to modify the Width property on a local temporary copy rather than the underlying size. It's the same problem that we're seeing here with collections, so again, the compiler doesn't allow it.

In short, the C# compiler only lets you do anything to a value if that value is stored either in a local variable, or in an array.

Dude, Where's My LValue?

This seems like another disappointing moment for CLR generics. I'd already got over the fact that I can't do all the wild and whacky stuff that used to be possible with C++ templates. I'd even come around to the idea that this was probably, on balance, a good thing. But this seems like a pretty serious problem, as it diminishes the usefulness of what is supposed to be one of the important use cases for CLR generics.

The C++ standard library's vector class is the nearest equivalent to .NET v2.0's generic List<T> - they are both essentially dynamically resizeable arrays. But if you look at the definition of vector's [] operator, you'll see that it returns a reference to the item, rather than a copy. In other words, the STL's dynamic array behaves in much the same way as the intrinsic .NET array. (In C++ terminology, both the built-in array and the vector's indexing operator return an lvalue.)

Maybe I've missed something - maybe there is some way of returning a managed pointer from an indexer in C# without resorting to unsafe code. But if there were, wouldn't the generic collection classes in the class libraries be doing just that?

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites