LINQ to LINQ

Sunday 1 August, 2010, 06:27 PM

This is part three in a series of blogs.

Simple but Too Specialized
Moving Away from Anonymous Types
LINQ to LINQ (this entry)
The Missing LINQ: IEnumerable<T> and Scalars
Other Languages
Closing Thoughts

Part 3 – LINQ to LINQ

Last time, I wrote a generalized method that built an n-ary Cartesian Product. It did this by building up a LINQ query—it chained as many SelectMany calls together as were needed. But it turns out that I am reinventing the wheel here—this is an example of a slightly more general process. Our n-ary Cartesian product method works by taking a starting value (a sequence containing the empty set), and repeatedly applying a function (SelectMany) once for each item in an input sequence (the sequence of input sets we’re combining), feeding the output of each step back in as the input to the next.

If the LINQ team had employed Apple’s marketing agency, this might be a good time to say: there’s an operator for that!

This concept of building up something by repeatedly applying the output of a function as the input to the same function is quite a common one. The process of calculating the sum of a list of numbers has exactly the same structure: your initial value is zero, and then for each number in the input list, you apply the ‘+’ (addition) function to the running total and the current number. It’s the same process, but using numbers and addition instead of sequences and SelectMany.

Most list processing systems have some general-purpose implementation of this process, and it goes by various names, including fold and reduce. And in LINQ it’s called Aggregate. Here’s a modified version of the method, using the LINQ Aggregate operator:

public static IEnumerable<object[]> CartesianProduct(params object[][] inputs)
{
    return inputs.Aggregate(
        (IEnumerable<object[]>) new object[][] { new object[0] },
        (soFar, input) => soFar.SelectMany(prevProductItem =>
            from item in input
            select prevProductItem.Concat(new object[] { item }).ToArray()));
}

Earlier, I turned my LINQ query expression into explicit calls to SelectMany, but that was just an illustrative device to make it more clear what’s going on. Having got to this stage, I think it looks slightly neater to go back to the query expression syntax. It’s no less powerful in this example:

public static IEnumerable<object[]> CartesianProduct(params object[][] inputs)
{
    return inputs.Aggregate(
        (IEnumerable<object[]>) new object[][] { new object[0] },
        (soFar, input) =>
            from prevProductItem in soFar
            from item in input
            select prevProductItem.Concat(new object[] { item }).ToArray());
}

So there’s the function that uses LINQ to build a LINQ query. Specifically, it uses the LINQ Aggregate operator to combine a series of LINQ SelectMany queries into one big query.

I can now write this sort of code:

object[] letters = { 'A', 'B', 'C' };
object[] numbers = { 1, 2, 3, 4 };
object[] colours = { "Red", "Blue" };

var cartesianProduct = CartesianProduct(letters, numbers, colours);
foreach (var item in cartesianProduct)
{
    Console.WriteLine(string.Join(",", item));
}

This code produces exactly the same output as the previous output shown, i.e., it is the Cartesian product of the three sets being fed into the method.

Comparing this with the original example in part one of this series, everything’s now an object[], so I’ve lost some compile time type checking, and I’ve lost meaningful names for our items, but the benefit is increased generality—I can pass as many sets as I like into the CartesianProduct method. I can decide on the number of sets at runtime, which was my original requirement.

Here’s a different way to use this CartesianProduct method. This next example generates digits for all numbers in a particular base of a particular length. The digitValues array contains all available digits in the selected base (e.g. {0, 1} for base 2), and then I use the Enumerable.Repeat method to generate a sequence containing that set of digits over and over again for the specified number of digits.

public static IEnumerable<object[]> Numbers(int radix, int digits)
{
    object[] digitValues = Enumerable.Range(0, radix).Cast<object>().ToArray();
    return CartesianProduct(Enumerable.Repeat(digitValues, digits).ToArray());
}

Those calls to Cast and ToArray are bugging me though.

Cleaning it Up

Earlier, I made a rather suspect decision—you may well have raised an eyebrow when I said I was going to represent each tuple as an array of objects. Arrays? Really? In the midst of all this LINQ, I decided to use arrays? Well actually, originally I didn’t—before I wrote this series of blogs, I started out by using enumerations for the tuples as well as for the input sets. The reason I introduced object[] was to make the example slightly easier to follow—as I was working through it, I kept forgetting whether the enumeration I was looking at was the one containing the input sets, or was one of the input sets themselves, or was perhaps one of my output values…

The object[] type had the useful benefit of being visually very distinct from all the other IEnumerable<T> types, making it very easy to know when I was looking at something representing a tuple in the output set. However, now that I’ve got it working, I want to get rid of it, because arrays are an unnecessarily restrictive choice here. Here’s a modified version that uses enumerables:

public static IEnumerable<IEnumerable<object>> CartesianProduct(
    IEnumerable<IEnumerable<object>> inputs)
{
    return inputs.Aggregate(
        (IEnumerable<IEnumerable<object>>) new object[][] { new object[0] },
        (soFar, input) =>
            from prevProductItem in soFar
            from item in input
            select prevProductItem.Concat(new object[] { item }));
}

This allows a slightly neater version of Numbers:

public static IEnumerable<IEnumerable<object>> Numbers(int radix, int digits)
{
    var digitValues = Enumerable.Range(0, radix).Cast<object>();
    return CartesianProduct(Enumerable.Repeat(digitValues, digits));
}

But there’s still that call to Cast<object>(). That’s only in there because CartesianProduct is using object as its representation for items in the input sets. For my first few examples, where the input sets were all of different types (char, int, and string), that made sense, but in this case, all three sets contain members of the same type: int. (And as it happens, in the program I was working on that originally motivated this blog post, all the input sets contained members of type Action.) I’d like to improve this. I can parameterise CartesianProduct by the type it uses to represent elements in input sets:

public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
    IEnumerable<IEnumerable<T>> inputs)
{
    return inputs.Aggregate(
        (IEnumerable<IEnumerable<T>>) new T[][] { new T[0] },
        (soFar, input) =>
            from prevProductItem in soFar
            from item in input
            select prevProductItem.Concat(new T[] { item }));
}

// Enable variable argument list.
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
    params IEnumerable<T>[] inputs)
{
    IEnumerable<IEnumerable<T>> e = inputs;
    return CartesianProduct(e);
}

I can now modify our Numbers method to return a more specifically typed enumeration, and I can remove that ugly call to Cast as well:

public static IEnumerable<IEnumerable<int>> Numbers(int radix, int digits)
{
    var digitValues = Enumerable.Range(0, radix);
    return CartesianProduct(Enumerable.Repeat(digitValues, digits));
}

That Repeat method reminds me of a minor omission from LINQ that has caused our CartesianProduct to be slightly messier than necessary. I’ll clean things up further by adding some helpers in the next blog in this series.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites

LINQ to LINQ

Part 3 – LINQ to LINQ

Cleaning it Up