Other Languages

Thursday 5 August, 2010, 10:33 PM

This is part five in a series of blogs:

Simple but Too Specialized
Moving Away from Anonymous Types
LINQ to LINQ
The Missing LINQ: IEnumerable<T> and Scalars
Other Languages (this entry)
Closing Thoughts

Part 5 – Other Languages

In the previous blogs in this series, I’ve shown how to generate Cartesian products in C#. I started with the simple, elegant, anonymous type solution that works when you know at compile time exactly what your inputs are. But because my original scenario involved varying inputs, I built a more generalized solution, and then honed it to the point where I was reasonably happy with its size and readability.

In this part, I thought it’d be interesting to show solutions in a handful of other languages. In language advocacy fist fights^H^H^H^H^H^H^H^H^H^H^H debates, it’s common to argue that a language’s power is inversely proportional to the amount of code required to express a particular concept. In other words less is more—the language that can solve a particular problem in the smallest space wins. (I don’t completely agree with that—I think readability matters, and terseness is sometimes the enemy of readability. But size has the benefit of being objective.)

Haskell offers the shortest solution I’ve seen to this problem:

sequence [['1', '2'], ['A', 'B']]

This produces:

["1A","1B","2A","2B"]

That initially struck me as a little wierd until I realised that a string in Haskell is just a list of characters. I think it makes slightly more sense to think of that output as this:

[['1','A'],['1','B'],['2', 'A'],['2','B']]

That makes it more obvious that sequence has just produced every possible concatenation. But Haskell prefers to print lists of characters using string literal syntax. In fact if you type in the second representation at a Haskell prompt, it'll print out the first one.

You might think that this is cheating because I appear to be using a built-in function to do the job. But that would be a bad argument on two counts. First, the expressiveness of the available library functions is a critically important aspect of the practical usefulness of any language. Second, this built-in sequence function is not just for Cartesian products—it’s more general-purpose. What you’re looking at in that example is a very general mechanism being applied in this particular case to produce an n-ary Cartesian product.

The sequence function works with any monadic type. For example, I can use Haskell’s Maybe monad, which is roughly analogous to the idea of a nullable type:

sequence [Just 1, Just 2, Just 3]

Here I’m passing a list of maybes where every item does in fact have a value, and it produces:

Just [1,2,3]

But if I include Nothing, which is (sort of) analogous to null, then the usual behaviour of the Maybe monad kicks in, which is to say I get Nothing out. So this:

sequence [Just 1, Just 2, Nothing, Just 3]

produces this:

Nothing

Haskell also uses monads to handle IO, so I could use sequence to execute a series of IO actions:

get2chars = sequence [getChar, getChar]

The getChar function returns an IO Char, so passing a list of these into sequence produces something of type IO [Char], i.e., it produces an IO monad which can feed a list of Char values into a subsequent step:

do { result <- ioseq; putStrLn result }

That reads two characters, and then prints them to the output. Not a particularly useful example perhaps, but it illustrates that the way in which the sequence function combines the elements is defined not by sequence itself, but by the monad you’re using.

If you’re not entirely familiar with monads, and want to understand this in terms more familiar to a C# developer, you could think of Haskell’s sequence function as being the generalized version of the CartesianProduct function I wrote earlier, but rather than being specific to IEnumerable<T>:

public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
    IEnumerable<IEnumerable<T>> inputs)

it works for any type that provides the LINQ SelectMany operator. (So a hypothetical C# equivalent of sequence would work against LINQ to XML or LINQ to Entities, and not just LINQ to Objects, which is what our current CartesianProduct implementation uses.) SelectMany is just LINQ’s name for one of the standard monadic operators. (It’s sometimes called “bind”, and Haskell denotes it with the >>= operator.) So you could imagine this sort of thing:

public static IEnumerable<IMonad<T>> Sequence<T>(IEnumerable<IMonad<T>> inputs)

That’s not real because there is no IMonad<T> in .NET—I’m just trying to illustrate the broad idea behind Haskell’s sequence function. So the reason we can use sequence to build an n-ary Cartesian product is the same reason we can build them with C# LINQ queries with multiple from clauses—both C# and Haskell represent lists as monadic types whose binding behaviour happens to be a Cartesian product. C# doesn’t explicitly call these things monads, but it’s the same idea in practice.

(Note that sequence has one limitation that my C# code doesn't: Haskell has no direct equivalent of System.Object. The C# method can combine any old mixture of types, whereas Haskell's a bit more picky. As it happens, I don't care for my original problem: I only needed to cope with the number of input sets being unknown at compile time. The set members all happened to be of the same type; the ability to cope with a mixture of types was an accidental bonus, and not even one I wanted to keep. As you will recall, I eventually moved from object to a type argument. But if you really wanted to handle a mixture of types not known until runtime, Haskell's actually at a disadvantage; or perhaps the idiomatic Haskell response would be to wave my hand in the manner of a Jedi and tell you that this is not the type system you're looking for. In practice, I suspect that if you're using Haskell, you'd probably be unlikely to find yourself in a position where you were trying to handle that unknown mixture of types anyway.)

So Haskell wins because it has a built-in function, sequence, which combines a list of monadic types. I had to write my own function to do that in C#. But if Haskell didn’t have sequence, I could build it in much the same way I did with C#. In C# I used the Aggregate operator, and in Haskell I could use the similar foldr method.

mySequence = foldr
  (\ input soFar ->
    soFar >>= (\prevProductItem ->
      input >>= (\item -> [item:prevProductItem])))
  [[]]

Although that works here, it’s a little too specialized—I’m constructing lists with the [] syntax here, and the original sequence function works with any monad. Fortunately, as mentioned earlier, Haskell requires monadic types to define an operation to construct an instance of the monad from a value. (The gap I filled in C# with my EnumerableFrom method in the previous part. And as I noted in a recent update to the previous article, Rx provides something similar through its EnumerableEx.return method.) This is provided by the return operation:

mySequence = foldr
  (\ input soFar ->
    soFar >>= (\prevProductItem ->
      input >>= (\item -> return (item:prevProductItem))))
  (return [])

It might not be obvious if you’re unfamiliar with Haskell, but the structure’s pretty similar to the C# version. The from clauses have turned into >>= operators, the call to Append has been replaced with the colon operator. And in fact that’s a prepend, rather than an append; everything’s turned inside out because foldr works from back to front… (I could have used foldl, but foldr seems to be the more common idiom in Haskell.)

Haskell offers a specialized syntax for chaining monads together: the do notation. The following example is equivalent to the preceding one, but it might be more idiomatically correct:

mySequence = foldr
  (\ input soFar ->
      do prevProductItem <- soFar
         item <- input
         return (item:prevProductItem))
  (return [])

Hopefully this makes the structural similarity between this Haskell example and the C# CartesianProduct slightly more obvious—compare the <- clauses in this do notation with the from clauses in the C#. (Compared to foldr, LINQ’s Aggregate happens to use the reverse order for its arguments, and also for the arguments for the lambda, but otherwise, this is pretty similar.)

This still relies on monadic operators, so just as C# from clauses work with any type that offers the relevant LINQ operators, all the Haskell examples shown so far will work with any monad. And when I feed in a list of lists, it’s the Haskell List monad that’s doing the actual work of producing the cross product here, just as the C# code ultimately relies on the LINQ to Objects code in the Enumerable class. If you prefer your code more explicit, here’s one that generates the products without relying on monads:

explicitCartes = foldr
  (\ input soFar ->
    concatMap (\prevProductItem ->
      concatMap (\item -> [item:prevProductItem]) input)
      soFar)
  [[]]

Here’s a completely different take on the same problem:

compCartes = foldr
  (\ input soFar ->
    [item:prevProductItem | prevProductItem <- soFar, item <- input])
  [[]]

That’s using a list comprehension. Here I’m essentially exploiting the fact that the language will generate a 2-set Cartesian product if I define a list in terms of two input sets, and once again I’m using foldr to generalise that to an n-ary cross product. (This last example only works with lists—unlike the built-in sequence, and the last two mySequence examples above, this won’t work with other monads.)

There are probably other viable approaches. These may not be particularly idiomatically good examples of Haskell by the way—not only do I not know much Haskell, I’ve also attempted to keep some kind of connection between my C# examples and the Haskell code through my choice of variable names, to make it easier to explore the similarities and differences.

Craig Andera helpfully supplied me with a Clojure version:

(defn cross-prod [& colls]
  (->> colls
       (reduce #(for [x %1 y %2] [x y]))
       (map flatten)))

I’m pretty sure that’s doing much the same thing—reduce is Lisp’s name for what Haskell calls foldr, and what LINQ calls Aggregate.

I also took a shot at an F# version. I don’t really know much F#, but F# MVP Richard Minerich (aka rickasaurus) generously looked at my example, and was kind enough to describe it as “quite good”. He also came up with something more clever, but it would actually have required more code, so I'm sticking with my original:

let cartes inputs =
  List.foldBack
    (fun input soFar -> seq { for y in soFar do for x in input do yield x::y })
    inputs
    (Seq.singleton List.Empty)

let result = cartes [[1..5];[42;50];[99;100]] |> Seq.toArray

The List.foldBack operation is similar to Haskell’s foldr, and again, the overall structure is much the same as the C# example. Don’t be thrown by the name seq—this is different from Haskell’s sequence function. In F#, the seq keyword is used to introduce a block of code that will produce a sequence of items as its output. The block that follows here is conceptually pretty similar to the C# example, but with a pair of for … in … do constructs in place of a pair of from clauses.

(Note that as with the Haskell version, this one’s also picky about types. But again it doesn’t matter to me, because in my original problem, the input sets all contained members of the same type. It was only the number of sets I didn't know at compile time.)

The final line of this F# example is just for test purposes, and the reason I’m feeding the result through the Seq.toArray function is that the debugger shows more useful information for arrays than it does for generic sequences. (I guess that’s because a sequence might be infinite, so attempting to display the whole thing might be undesirable. But in converting it to an array, I’ve already chosen to force the sequence to be fully evaluated, so the debugger can show it safely.) Here’s the output from evaluating that last line in the F# Interactive window:

val result : int list array =
  [|[1; 42; 99]; [2; 42; 99]; [3; 42; 99]; [4; 42; 99]; [5; 42; 99];
    [1; 50; 99]; [2; 50; 99]; [3; 50; 99]; [4; 50; 99]; [5; 50; 99];
    [1; 42; 100]; [2; 42; 100]; [3; 42; 100]; [4; 42; 100]; [5; 42; 100];
    [1; 50; 100]; [2; 50; 100]; [3; 50; 100]; [4; 50; 100]; [5; 50; 100]|]

So that seems to work.

Old School

I asked a friend who works mainly in Java for a Java version of this code. I’m not going to offer any commentary beyond observing that a) this is probably roughly what it would have look like in C# before C# 3.0 turned up and b) this is why I like C# 3.0 so much:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;


public class Cartesian {
    public static void main(final String... args) {

         final Object[] letters = { 'A', 'B', 'C' };
         final Object[] numbers = { 1, 2, 3, 4 };
         final Object[] colours = { "Red", "Blue" };

         final List<List<Object>> params = new ArrayList<List<Object>>();
         params.add(Arrays.asList(letters));
         params.add(Arrays.asList(numbers));
         params.add(Arrays.asList(colours));

         for( final List<Object> item : cartesian(params)) {
             System.out.println(item);
        }
    }

     public static <T> List<List<T>> cartesian(final List<List<T>> params) {
         if( !(params.size() > 0) ) return new LinkedList<List<T>>();
         return cartesian(new LinkedList<List<T>>(params),
             new ArrayList<List<T>>());
     }

    private static <T> List<List<T>> cartesian(
      final LinkedList<List<T>> params, final List<List<T>> results) {
         if( !(params.size() > 0) ) return results;
         aggregate(params.removeLast(),results);
         return cartesian(params,results);
     }

    private static <T> void aggregate(final List<T> items,
                                      final List<List<T>> target) {
        if( target.size() == 0 ) {
            for(final T item : items) {
                final List<T> targetItems = new ArrayList<T>();
                targetItems.add(item);
                target.add(targetItems);
            }
        } else {
            final List<List<T>> oldTarget = new LinkedList<List<T>>(target);
            target.clear();
            for(final T item : items) {
                for(final List<T> oldTargetItems : oldTarget) {
                    final List<T> targetItems = new ArrayList<T>();
                    targetItems.add(item);
                    targetItems.addAll(oldTargetItems);
                    target.add(targetItems);
                }
            }
        }
    }
}

Next time, some final thoughts.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites

Other Languages

Part 5 – Other Languages

Old School