(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
This is part five in a series of blogs:
In the previous blogs in this series, I’ve shown how to generate Cartesian products in C#. I started with the simple, elegant, anonymous type solution that works when you know at compile time exactly what your inputs are. But because my original scenario involved varying inputs, I built a more generalized solution, and then honed it to the point where I was reasonably happy with its size and readability.
In this part, I thought it’d be interesting to show solutions in a handful of other languages. In language advocacy fist fights^H^H^H^H^H^H^H^H^H^H^H debates, it’s common to argue that a language’s power is inversely proportional to the amount of code required to express a particular concept. In other words less is more—the language that can solve a particular problem in the smallest space wins. (I don’t completely agree with that—I think readability matters, and terseness is sometimes the enemy of readability. But size has the benefit of being objective.)
Haskell offers the shortest solution I’ve seen to this problem:
sequence [['1', '2'], ['A', 'B']]
This produces:
["1A","1B","2A","2B"]
That initially struck me as a little wierd until I realised that a string in Haskell is just a list of characters. I think it makes slightly more sense to think of that output as this:
[['1','A'],['1','B'],['2', 'A'],['2','B']]
That makes it more obvious that sequence
has just produced every possible concatenation. But Haskell prefers to print lists of characters using string literal syntax. In fact if you type in the second representation at a Haskell prompt, it'll print out the first one.
You might think that this is cheating because I appear to be using a built-in function to do the job. But that would be a bad argument on two counts. First, the expressiveness of the available library functions is a critically important aspect of the practical usefulness of any language. Second, this built-in sequence
function is not just for Cartesian products—it’s more general-purpose. What you’re looking at in that example is a very general mechanism being applied in this particular case to produce an n-ary Cartesian product.
The sequence
function works with any monadic type. For example, I can use Haskell’s Maybe
monad, which is roughly analogous to the idea of a nullable type:
sequence [Just 1, Just 2, Just 3]
Here I’m passing a list of maybes where every item does in fact have a value, and it produces:
Just [1,2,3]
But if I include Nothing
, which is (sort of) analogous to null, then the usual behaviour of the Maybe
monad kicks in, which is to say I get N
othing
out. So this:
sequence [Just 1, Just 2, Nothing, Just 3]
produces this:
Nothing
Haskell also uses monads to handle IO, so I could use sequence
to execute a series of IO actions:
get2chars = sequence [getChar, getChar]
The getChar function returns an IO Char
, so passing a list of these into sequence
produces something of type IO [Char]
, i.e., it produces an IO monad which can feed a list of Char
values into a subsequent step:
do { result <- ioseq; putStrLn result }
That reads two characters, and then prints them to the output. Not a particularly useful example perhaps, but it illustrates that the way in which the sequence
function combines the elements is defined not by sequence
itself, but by the monad you’re using.
If you’re not entirely familiar with monads, and want to understand this in terms more familiar to a C# developer, you could think of Haskell’s sequence
function as being the generalized version of the CartesianProduct
function I wrote earlier, but rather than being specific to IEnumerable
<T>
:
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>( IEnumerable<IEnumerable<T>> inputs)
it works for any type that provides the LINQ SelectMany
operator. (So a hypothetical C# equivalent of sequence
would work against LINQ to XML or LINQ to Entities, and not just LINQ to Objects, which is what our current CartesianProduct
implementation uses.) SelectMany
is just LINQ’s name for one of the standard monadic operators. (It’s sometimes called “bind”, and Haskell denotes it with the >>=
operator.) So you could imagine this sort of thing:
public static IEnumerable<IMonad<T>> Sequence<T>(IEnumerable<IMonad<T>> inputs)
That’s not real because there is no IMonad
<T>
in .NET—I’m just trying to illustrate the broad idea behind Haskell’s sequence
function. So the reason we can use sequence
to build an n-ary Cartesian product is the same reason we can build them with C# LINQ queries with multiple from
clauses—both C# and Haskell represent lists as monadic types whose binding behaviour happens to be a Cartesian product. C# doesn’t explicitly call these things monads, but it’s the same idea in practice.
(Note that sequence
has one limitation that my C# code doesn't: Haskell has no direct equivalent of System.Object
. The C# method can combine any old mixture of types, whereas Haskell's a bit more picky. As it happens, I don't care for my original problem: I only needed to cope with the number of input sets being unknown at compile time. The set members all happened to be of the same type; the ability to cope with a mixture of types was an accidental bonus, and not even one I wanted to keep. As you will recall, I eventually moved from object
to a type argument. But if you really wanted to handle a mixture of types not known until runtime, Haskell's actually at a disadvantage; or perhaps the idiomatic Haskell response would be to wave my hand in the manner of a Jedi and tell you that this is not the type system you're looking for. In practice, I suspect that if you're using Haskell, you'd probably be unlikely to find yourself in a position where you were trying to handle that unknown mixture of types anyway.)
So Haskell wins because it has a built-in function, sequence
, which combines a list of monadic types. I had to write my own function to do that in C#. But if Haskell didn’t have sequence
, I could build it in much the same way I did with C#. In C# I used the Aggregate
operator, and in Haskell I could use the similar foldr
method.
mySequence = foldr (\ input soFar -> soFar >>= (\prevProductItem -> input >>= (\item -> [item:prevProductItem]))) [[]]
Although that works here, it’s a little too specialized—I’m constructing lists with the [] syntax here, and the original sequence
function works with any monad. Fortunately, as mentioned earlier, Haskell requires monadic types to define an operation to construct an instance of the monad from a value. (The gap I filled in C# with my EnumerableFrom
method in the previous part. And as I noted in a recent update to the previous article, Rx provides something similar through its EnumerableEx.return
method.) This is provided by the return
operation:
mySequence = foldr (\ input soFar -> soFar >>= (\prevProductItem -> input >>= (\item -> return (item:prevProductItem)))) (return [])
It might not be obvious if you’re unfamiliar with Haskell, but the structure’s pretty similar to the C# version. The from
clauses have turned into >>=
operators, the call to Append
has been replaced with the colon operator. And in fact that’s a prepend, rather than an append; everything’s turned inside out because foldr
works from back to front… (I could have used foldl
, but foldr
seems to be the more common idiom in Haskell.)
Haskell offers a specialized syntax for chaining monads together: the do
notation. The following example is equivalent to the preceding one, but it might be more idiomatically correct:
mySequence = foldr (\ input soFar -> do prevProductItem <- soFar item <- input return (item:prevProductItem)) (return [])
Hopefully this makes the structural similarity between this Haskell example and the C# CartesianProduct slightly more obvious—compare the <-
clauses in this do
notation with the from
clauses in the C#. (Compared to foldr
, LINQ’s Aggregate
happens to use the reverse order for its arguments, and also for the arguments for the lambda, but otherwise, this is pretty similar.)
This still relies on monadic operators, so just as C# from
clauses work with any type that offers the relevant LINQ operators, all the Haskell examples shown so far will work with any monad. And when I feed in a list of lists, it’s the Haskell List monad that’s doing the actual work of producing the cross product here, just as the C# code ultimately relies on the LINQ to Objects code in the Enumerable class. If you prefer your code more explicit, here’s one that generates the products without relying on monads:
explicitCartes = foldr (\ input soFar -> concatMap (\prevProductItem -> concatMap (\item -> [item:prevProductItem]) input) soFar) [[]]
Here’s a completely different take on the same problem:
compCartes = foldr (\ input soFar -> [item:prevProductItem | prevProductItem <- soFar, item <- input]) [[]]
That’s using a list comprehension. Here I’m essentially exploiting the fact that the language will generate a 2-set Cartesian product if I define a list in terms of two input sets, and once again I’m using foldr
to generalise that to an n-ary cross product. (This last example only works with lists—unlike the built-in sequence
, and the last two mySequence
examples above, this won’t work with other monads.)
There are probably other viable approaches. These may not be particularly idiomatically good examples of Haskell by the way—not only do I not know much Haskell, I’ve also attempted to keep some kind of connection between my C# examples and the Haskell code through my choice of variable names, to make it easier to explore the similarities and differences.
Craig Andera helpfully supplied me with a Clojure version:
(defn cross-prod [& colls] (->> colls (reduce #(for [x %1 y %2] [x y])) (map flatten)))
I’m pretty sure that’s doing much the same thing—reduce
is Lisp’s name for what Haskell calls foldr
, and what LINQ calls Aggregate
.
I also took a shot at an F# version. I don’t really know much F#, but F# MVP Richard Minerich (aka rickasaurus) generously looked at my example, and was kind enough to describe it as “quite good”. He also came up with something more clever, but it would actually have required more code, so I'm sticking with my original:
let cartes inputs = List.foldBack (fun input soFar -> seq { for y in soFar do for x in input do yield x::y }) inputs (Seq.singleton List.Empty) let result = cartes [[1..5];[42;50];[99;100]] |> Seq.toArray
The List.foldBack
operation is similar to Haskell’s foldr
, and again, the overall structure is much the same as the C# example. Don’t be thrown by the name seq
—this is different from Haskell’s sequence
function. In F#, the seq
keyword is used to introduce a block of code that will produce a sequence of items as its output. The block that follows here is conceptually pretty similar to the C# example, but with a pair of for … in … do
constructs in place of a pair of from
clauses.
(Note that as with the Haskell version, this one’s also picky about types. But again it doesn’t matter to me, because in my original problem, the input sets all contained members of the same type. It was only the number of sets I didn't know at compile time.)
The final line of this F# example is just for test purposes, and the reason I’m feeding the result through the Seq.toArray
function is that the debugger shows more useful information for arrays than it does for generic sequences. (I guess that’s because a sequence might be infinite, so attempting to display the whole thing might be undesirable. But in converting it to an array, I’ve already chosen to force the sequence to be fully evaluated, so the debugger can show it safely.) Here’s the output from evaluating that last line in the F# Interactive window:
val result : int list array = [|[1; 42; 99]; [2; 42; 99]; [3; 42; 99]; [4; 42; 99]; [5; 42; 99]; [1; 50; 99]; [2; 50; 99]; [3; 50; 99]; [4; 50; 99]; [5; 50; 99]; [1; 42; 100]; [2; 42; 100]; [3; 42; 100]; [4; 42; 100]; [5; 42; 100]; [1; 50; 100]; [2; 50; 100]; [3; 50; 100]; [4; 50; 100]; [5; 50; 100]|]
So that seems to work.
I asked a friend who works mainly in Java for a Java version of this code. I’m not going to offer any commentary beyond observing that a) this is probably roughly what it would have look like in C# before C# 3.0 turned up and b) this is why I like C# 3.0 so much:
import java.util.ArrayList; import java.util.Arrays; import java.util.LinkedList; import java.util.List; public class Cartesian { public static void main(final String... args) { final Object[] letters = { 'A', 'B', 'C' }; final Object[] numbers = { 1, 2, 3, 4 }; final Object[] colours = { "Red", "Blue" }; final List<List<Object>> params = new ArrayList<List<Object>>(); params.add(Arrays.asList(letters)); params.add(Arrays.asList(numbers)); params.add(Arrays.asList(colours)); for( final List<Object> item : cartesian(params)) { System.out.println(item); } } public static <T> List<List<T>> cartesian(final List<List<T>> params) { if( !(params.size() > 0) ) return new LinkedList<List<T>>(); return cartesian(new LinkedList<List<T>>(params), new ArrayList<List<T>>()); } private static <T> List<List<T>> cartesian( final LinkedList<List<T>> params, final List<List<T>> results) { if( !(params.size() > 0) ) return results; aggregate(params.removeLast(),results); return cartesian(params,results); } private static <T> void aggregate(final List<T> items, final List<List<T>> target) { if( target.size() == 0 ) { for(final T item : items) { final List<T> targetItems = new ArrayList<T>(); targetItems.add(item); target.add(targetItems); } } else { final List<List<T>> oldTarget = new LinkedList<List<T>>(target); target.clear(); for(final T item : items) { for(final List<T> oldTargetItems : oldTarget) { final List<T> targetItems = new ArrayList<T>(); targetItems.add(item); targetItems.addAll(oldTargetItems); target.add(targetItems); } } } } }
Next time, some final thoughts.