(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
Jon Skeet recently observed that range variables in LINQ expressions are not like other variables. A range variable is one like foobar
in the following:
var query = from foobar in source where foobar > 20 select foobar;
Jon points out that the analogy people often draw – the iteration variable in a foreach
loop – isn’t that close an analogy. I thought it would be interesting to come up with a concrete illustration of the differences.
Before I start, let me introduce a type:
List<Func<int>>
That’s a list of functions. Each function in the list takes no arguments, and returns an integer. Here’s a method that walks through such a list, invokes each function in turn, and prints the result:
static void PrintValues(List<Func<int>> funcList) { foreach (Func<int> func in funcList) { Console.WriteLine(func()); } }
I’m going to use this to illustrate some differences between LINQ query expression range variables and foreach
iteration variables.
In order to take a look at these variables, I’m going to capture them in lambdas, so we can poke them about a bit and see what happens. This example iterates through the numbers 1 to 10, building a function that captures the iteration variable each time round:
var source = Enumerable.Range(1, 10); List<Func<int>> capturedForeachIterationVariables = new List<Func<int>>(); foreach (int i in source) { capturedForeachIterationVariables.Add(() => i); }
That lambda (the “() => i
” expression) denotes a function that takes no arguments (hence the empty ()) and returns the value of i
. And the i
in scope here is the iteration variable of the for each loop.
We can print out the results using the method I showed earlier:
PrintValues(capturedForeachIterationVariables);
And here’s the outcome:
10 10 10 10 10 10 10 10 10 10
This illustrates that there is just the one iteration variable. Our list contains 10 functions, all of which captured that one same iteration variable. That’s why they printed out the final value.
With a slight modification, we can give each lambda its own private variable to capture:
List<Func<int>> capturedForeachLocals = new List<Func<int>>(); foreach (int i in source) { int iLocal = i; capturedForeachLocals.Add(() => iLocal); }
While i
remains in scope for the entire execution of the loop, iLocal
goes out of scope at the end of each iteration. Consequently, each time round the loop, we create a lambda that captures a different iLocal
. So when we print the results, we see something different:
1 2 3 4 5 6 7 8 9 10
What about range variables? Here’s the rough equivalent
var query = from i in source select ((Func<int>) (() => i)); List<Func<int>> capturedRangeVariables = query.ToList();
The syntax is a little messy here because C# is unable to infer the lambda type. Remember, lambda expressions don’t have an intrinsic type, so the compiler needs more information. In the previous two examples, the compiler was able to infer what was required from the signature of the List<T>.Add
method to which we were passing the lambda. I can’t work out a neat way of giving the compiler what it needs here, so I’ve settled for a rather ugly cast. (I suppose I could also have used a variation on the Funcify
function from that recent blog, but it doesn’t look much better to me.)
Anyway, here are the results:
1 2 3 4 5 6 7 8 9 10
So the behaviour is not like the foreach
iteration variable – it’s more like the case where we made a copy of the iteration variable. That’s a better analogy, but we can dig a little further.
Expressions in LINQ query expressions can sometimes have side-effects. Whether that’s allowed depends on what flavour of LINQ you’re using. We’re using LINQ to Objects here, and that permits it. Here’s a modified version of the previous query expression:
var query = from i in source select ((Func<int>) (() => ++i)); List<Func<int>> capturedRangeVariables = query.ToList();
This increments the captured range variable. Now that won’t interfere with the overall progress of the iteration, because as we already know, the lambda here seems to be working with a copy of i
rather than the ‘real’ i
. So if we print it out once, we get what you’d probably expect:
2 3 4 5 6 7 8 9 10 11
And if we print it out the list a second time, as you may or may not have been expecting everything moves up by one:
3 4 5 6 7 8 9 10 11 12
(Note, I’m only evaluating the query once – that happens when I call ToList
. And then I’m printing out the resulting list multiple times, and the numbers go up by one each time. If I were to re-evaluate the query, that’d be different. It would build a brand new set of functions that would start back from 2 again.)
We can represent this as a foreach
:
foreach (int i in source) { int iLocal = i; capturedForeachLocals.Add(() => ++iLocal); }
The behaviour is the same. Again, this suggests that LINQ query expression range variables work more like a copy of a foreach
iteration variable than the iteration variable itself. (Incidentally, if you were to try and use i++
in this foreach
loop, it wouldn’t even compile. C# prevents you from modifying the iteration variable – another way in which iteration variables are different from range variables.)
But we can go further still.
LINQ query expressions can contain many clauses, each of which will often want to access the range variable. This means I can build a query that attempts to modify the range variable in one clause before going on to use it again in a later clause:
var query = from i in source where ++i < 5 select ((Func<int>) (() => i)); List<Func<int>> capturedRangeVariables = query.ToList();
On the face of it, this has a where
clause that increments the range variable, and then only lets through items where the result is less than 5. So we’d expect only to see three items in the output. But what do we expect those output values to be? It would be reasonable expect to that the i
in the select
clause will have the incremented value – after all, it looks like the same variable, i
, that got incremented in the line above. And yet the output is as follows:
1 2 3
That is, the i
captured in the select
clause does not seem to have been affected by the increment operator applied to the i
in the where
clause. Clearly the ++ has had an effect – we’ve only got three numbers out, and we would have expected four if that ++ were not present.
So it’s as though the two clauses are working on different variables, despite being in the same scope and using the same name. The foreach
equivalent looks like this:
foreach (int i in source) { int iLocal1 = i; if (++iLocal1 < 5) { int iLocal2 = i; capturedForeachLocals.Add(() => iLocal2); } }
So query expression range variables clearly are quite different from foreach
iteration variables. Each clause in a query expression that refers to the range variable gets its own local variable, so while it may look like one variable, it’s not. And as Jon points out the range variable never really exists in the compiled result – once a query expression is compiled, the only tangible variables in the code are the local copies. And you can see why by manually expanding any of these query expressions out into their equivalent method call forms: you’ll see that the range variable vanishes as part of this transformation.
So the range variable is really just a label that directs the transformation of a query expression into code at compile time. Once it has done this job, it vanishes.