Saturday, July 31, 2010

F# Sequence Lazy Evaluation Gotchas

Today’s post will be old news to experienced F# programmers. Actually, it’s old news to me, relative newcomer that I am. However, it’s one of those things I tend to forget until it jumps up to bite me, as it did today. (As always, all the code here is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

Here’s a perfectly valid snippet of F# code, along with its typical output:

 
let mutable myList:int list = []
 
let sideEffect v = 
  printf "(%i) " v
  myList<-v::myList
  v
 
seq{ for i=0 to 99 do yield i }
  |> Seq.map sideEffect
  |> ignore
printfn ""
printfn "len=%i" myList.Length
 




Nothing from the sequence is added to list myList, and the printf side-effect is never called! What’s going on here?

What’s going on is that the sequence generated by the comprehension is evaluated in a lazy manner. That is, it is evaluated on demand. That part is easy to remember. What’s harder to remember is that neither the Seq.map nor the ignore generates such a demand. No demand means that the sequence is never evaluated and so the side effects of updating the list and printing are never evaluated either.

Now look at this example:

 
let mutable myList:int list = []
 
let sideEffect v = 
  printf "(%i) " v
  myList<-v::myList
  v
 
let mySeq =
  seq{ for i=0 to 99 do yield i }
    |> Seq.map sideEffect
printfn ""
printfn "len=%i mySeq=%A" myList.Length mySeq
 




In the foregoing example, the call to printfn of mySeq at the end causes the evaluation of five elements from the sequence. The side effect printfn’s end up sandwiched inside the original printfn and the second printfn at the end shows that the list has been modified.

And finally, an example which calls Seq.length, a function which will assure that the entire sequence is evaluated. A side-effect printf is called for each element in the sequence comprehension, and the list is updated to contain the entire sequence:

 
let mutable myList:int list = []
 
let sideEffect v = 
  printf "(%i) " v
  myList<-v::myList
  v
 
seq{ for i=0 to 99 do yield i }
  |> Seq.map sideEffect
  |> Seq.length 
  |> ignore 
printfn ""
printfn "len=%i" myList.Length
 




What’s the takeaway here?

First: always think about what things look like from the computer’s point of view. From a human point of view, a Seq.map piped to an ignore looks like it’s saying “map the sequence and ignore the result.” From the computer’s standpoint, however, it says “map the sequence when you need to, but the ignore means you’ll never need to.” Minus the side effects, same result; add the side effects, and the result is very different.

Second: this is why a lot of very smart programmers (e.g. Don Syme, Erik Meijer, Matthew Podwysocki) never tire of making Channel 9 videos reminding us the importance of immutability, the importance of avoiding (or at least taming) side-effects, etc. In the turmoil of the trenches, it can be tough to adhere to pure functional programming. But that same turmoil can be lessened by at least learning the lessons functional programming teaches.

-Neil

p.s. I gave this blog entry a title that I might type into a search engine the next time I forget about this gotcha and go searching for information about it. Good engineering pre-planning at work, lol!

Tuesday, July 27, 2010

Cartesian Product of Lists in F#

A problem came up on stackoverflow: how to compute the Cartesian product of several sequences in F#? The poster wanted a solution involving sequences, presumably to meet external requirements, so my solution below using lists wasn’t quite right. But I decided to post it here anyway, since it illustrates my current F# learning goal:

Leveraging as much of the built-in power of the F# language and libraries as possible when solving a problem.

So without further ado (other than to note that all the code here is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk) here is my Cartesian product of lists code. It’s fairly compact, but there may be even more succinct solutions.

 
// Cartesian product of a list of lists.
let rec cart nll = 
  let f0 n = function
    | [] -> [[n]]
    | nll -> List.map (fun nl->n::nl) nll
  match nll with
  | [] -> []
  | h::t -> List.collect (fun n->f0 n (cart t)) h
 
 
// Test.
 
let s = 
  [
    [0;1;2;];
    [3;4;];
    [5;6;7;];
  ] 
 
cart s |> List.iter (printfn "%A")
 


And here's a transliteration into sequences (but there may be a better way):

 
// Cartesian product of a seq of seqs.
let rec carts (nss:seq<seq<int>>) = 
  let f0 n (x:seq<seq<int>>) =
    match Seq.isEmpty x with
    | true -> seq{yield seq{yield n}}
    | _ -> Seq.map (fun nl->seq{yield n;yield! nl}) x
  match Seq.isEmpty nss with
  | true -> Seq.empty
  | false -> 
    Seq.collect 
      (fun n->f0 n (carts (Seq.skip 1 nss))) 
      (Seq.head nss) 
 

Tuesday, July 20, 2010

F#'s Little-Known (?) and (?<-) Operators

(Note: After posting this, I found a blog entry where Matthew Podwysocki uses a similar Dictionary<> example. I commend you to his more complete discussion of the dynamic operators at Using and Abusing the F# Dynamic Lookup Operator; it's well worth looking at. Vladimir Matveev has also just posted an article on dynamic lookup operators; I highly recommend it as well: Tricky late binding operators. And, since it never rains unless it pours, there is also a recent article by Tomas Petricek at hubFS which uses dymanic operators to great advantage in a real-world problem: Dynamic in F#: Reading data from SQL database. Dynamic is certainly a dynamic topic right now!)

This post is about a pair of F# operators that – unless you interface with dynamic programming languages or use reflection a lot – you’ve probably never encountered. These are the dynamic reflection operators, (?) and (?<-). Their main function is to clean up the syntax of dynamic language interfaces and domain-specific languages.

These two operators work similar to the C/C++ “stringizing” pre-processor operator. That is, they convert non-string code (limited to simple identifiers in the case of F#) into strings according to the following translation patterns:

e0?str =translates to=> e0 “str” and
e0?str<-e1 =translates to=> e0 “str” e1

In typical use, (?) is used create “get” operation, and (?<-) is used to create a “set” operation. Normally, this is done to provide a convenient interface to dynamic code.* In the canonical examples, this method is used to invoke a member by name on an instance loaded or created by reflection. But to simplify things a bit, the example below shows their use on a Dictionary<>. (As always, all the code here is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
open System.Collections.Generic
 
// Please note that this is a contrived example
// for illustration only!  There are better
// ways to accomplish what I'm doing here,
// and more appropriate uses of (?) and (?<-).
 
// Define (?) and (?<-) for 
// Dictionary<> access:
 
/// Lookup an item and return it,
/// or throw KeyNotFoundException.
let (?) (d:Dictionary<string,'T>) k =
  d.[k]
 
/// Set or reset an item.
let (?<-) (d:Dictionary<string,'T>) k v =
  d.[k]<-v
 
// Test.
 
let num = new Dictionary<string,int>()
 
// For example, func?plus21 <- (+) 21
// translates to: (?<-) func "plus21" ((+) 21) 
 
num?one <- 1
num?two <- 2
num?twenty <- 20
 
let fortyTwo = (num?one+num?twenty)*num?two
 
//let iWillFail = func?four 
 
printfn "Your breakpoint here."


In Expert F# 2.0, Don Syme (et al) caution against injudicious use of this pair of operators. In particular, there are concerns of type safety and performance in dynamic programming. You lose good stuff like some type checking, autocomplete, etc. Dynamic programming syntax helpers can result in code that looks as fast as compiled code yet masks underlying runtime expense. Also, many users will rarely encounter the (?) and (?<-) operators may be confused on encountering them in code.

On the flip side, these operators can make ugly code a lot more readable. So, if you need to interface with dynamic code, or clean up some aspects of a domain-specific language, there they are.

-Neil

*The generated names of these operators are op_Dynamic and op_DynamicAssignment.

Friday, July 16, 2010

A Two-Instruction Virtual RISC

One of my favorite hobbies is building really tiny virtual computers. The experience has even paid off a time or two when I needed to create a compact domain-specific language (DSL). This post shows a really, really tiny virtual computer that is nevertheless capable of computing square roots using the method in the previous post, and checking the result using multiplication.

How tiny is this computer? So tiny that it has only two instructions:

inc i = Increment the value in register i by one and increment the instruction pointer (ip).

jzd i j = If the value in register i is zero, set the instruction pointer to j. If the value is not zero, decrement the value in register i by one and increment the instruction pointer.

The basic operation is simple: starting at ip=0, retrieve and execute an instruction. Continue until the ip is either less than zero or greater than the index of the last instruction. When complete, return as a result the value of a specified register.

Below is an implementation in F#. Note that the registers are implemented as indices into an array of integers. A single-step function, “step” is provided for debugging. Note that “run” has a basic safety check against infinite loops. The code also shows some basic operations and a test. (As always, all the code here is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
/// Increment register i and 
/// advance the instruction pointer.
let inc i (d:int []) ip = 
  d.[i]<-d.[i]+1
  ip+1
 
/// If register i is zero, jump to ipj,
/// else, decrement register i and
/// advance the instruction pointer.
let jzd i j (d:int []) ip =
  match d.[i] with 
  | 0 -> j
  | _ -> 
    d.[i]<-d.[i]-1
    ip+1
 
/// Single-step and instruction.
/// Returns the new ip, or
/// -1 on under/overrun or halt.
let step (d:int [])
         (p:(int []->int->int)[]) 
         ip =
  match (ip<0) || (ip>=p.Length) with 
  | true -> -1
  | _ -> p.[ip] d ip
 
/// Run a program.
/// Halts on an ip of -1 or
/// when safety limit is reached.
/// Returns the value of register rtn.
let run (d:int []) 
        (p:(int []->int->int)[]) 
        rtn = 
  let rec run0 count ip = 
    match count>=System.Int32.MaxValue with
    | true -> failwith "Loop safty exceeded."
    | _ ->
    match step d p ip with 
    | -1 -> d.[rtn]
    | ip0 -> run0 (count+1) ip0 
  run0 0 0
 
 
/// Five register memory.
let d5 = [|0;100;0;0;|]   
 
/// Zero out register 1.
let pZero1 = 
  [|
    (*00*) jzd 1 -1 
    (*01*) jzd 0 0  
  |]
 
/// Set the value in register 1 to 5.
let pSet1To5 = 
  [|
    // Zero out 1.
    (*00*) jzd 1 2 
    (*01*) jzd 0 0 
    // Increment 5 times.
    (*02*) inc 1   
    (*03*) inc 1
    (*04*) inc 1
    (*05*) inc 1
    (*06*) inc 1
  |]
 
/// Copy the value in register 1 to 
/// register 2.
let pCopy1To2 = 
  [|
    (*00*) jzd 2 2  
    (*01*) jzd 0 0  
    (*02*) jzd 3 4  
    (*03*) jzd 0 2  
    (*04*) jzd 1 8  
    (*05*) inc 2     
    (*06*) inc 3     
    (*07*) jzd 0 4  
    (*08*) jzd 3 -1  
    (*09*) inc 1     
    (*10*) jzd 0 8   
  |]
 
// Tests.
let x0 = run d5 pZero1 1
let x1 = run d5 pSet1To5 1
let x2 = run d5 pCopy1To2 2
 


Now here is the integer square root program. (In order to make the program more readable, I define some constants and macros which do not alter the basic operation of the machine.)

/// Six register memory.
let d6 = [|0;0;0;0;0;0;|]   
 
/// Give the registers labels.
let zero = 0         
let n = 1         
let odd = 2         
let sqrt = 3         
let tmp0 = 4         
let tmp1 = 5   
 
// Jumping to -1 will halt.
let halt = -1   
 
/// Define jmp macro.
// Note: this is just a macro
// dependent on there being a 
// register with a constant value
// of zero.  It does not increase
// the size of the instruction set.
let jmp = jzd zero
 
/// Compute the integer square root 
/// of the value in register n, 
/// and return the result in register sqrt.      
let pSqrt = 
  [|
    // Set odd to 1.
    (*00*) jzd odd 2   
    (*01*) jmp 0  
    (*02*) inc odd   
    // Set tmp0 to zero.   
    (*03*) jzd tmp0 5  
    (*04*) jmp 3  
    // Set tmp1 to zero.
    (*05*) jzd tmp1 7  
    (*06*) jmp 5  
    // Copy odd to tmp0 and tmp1.
    (*07*) jzd odd 11
    (*08*) inc tmp0
    (*09*) inc tmp1
    (*10*) jmp 7  
    // Subtract tmp0 from n.
    (*11*) jzd tmp0 14
    (*12*) jzd n halt
    (*13*) jzd zero 11
    // Copy tmp1 to odd
    (*14*) jzd tmp1 17
    (*15*) inc odd
    (*16*) jmp 14
    // Next odd.
    (*17*) inc odd
    (*18*) inc odd
    // Increment Sqrt.
    (*19*) inc sqrt
    // Loop.
    (*20*) jmp 7
  |]
 
// This is cheating a bit,
// but it beats looking at
// 80K copies of "inc n", lol.
// 80K is convenient because its 
// square root is Sqrt(2)*100.
let test = 80000
d6.[n]<-test
 
// Find the square root. 
// Prints are formatted weird to fit in blog.
printf "The largest integer not exceeding "
printf "the square root of %i is " test
printfn "%i." (run d6 pSqrt sqrt)
 


And below is an extension of the above, being a multiplication program which computes the square of the result from above. (A good exercise would be to preserve some remainder from the previous program and add it back in here.)

 
// To check the answer, here is a 
// multiplication routine. 
 
// Alias the older labels.
let mul0 = n
let mul1 = sqrt
let prod = odd
 
/// Copy the value in mul1 (sqrt) to mul0 (n).
let pCopyMul1ToMul0 = 
  [|
    // Zero out mul0.
    (*00*) jzd mul0 2  
    (*01*) jmp 0
    // Zero out tmp0.
    (*02*) jzd tmp0 4
    (*03*) jmp 2
    // Move mul1 to mul0,tmp0
    (*04*) jzd mul1 8
    (*05*) inc mul0    
    (*06*) inc tmp0     
    (*07*) jmp 4
    // Move tmp0 back to mul1.
    (*08*) jzd tmp0 halt  
    (*09*) inc mul1     
    (*10*) jmp 8   
  |]
 
 
/// Mutliply mul0 by mul1.
let pMul0TimesMul1 =
  [|
    // Zero out tmp0.
    (*00*) jzd tmp0 2
    (*01*) jmp 0
    // Zero out prod.
    (*02*) jzd prod 4
    (*03*) jmp 2; 
    // Main loop on mul1.
    (*04*) jzd mul1 halt
    // Move mul0 to tmp0 and add to prod.
    (*05*) jzd mul0 9
    (*06*) inc tmp0
    (*07*) inc prod
    (*08*) jmp 5
    // Move tmp0 to mul0.
    // Loop back when done.
    (*09*) jzd tmp0 4
    (*10*) inc mul0
    (*12*) jmp 9
  |]
 
// Mutlipy the square root by itself. 
// Prints are formatted weird to fit in blog.
 
run d6 pCopyMul1ToMul0 zero |> ignore
 
printf "The largest perfect square not exceeding "
printf "%i is %i." test (run d6 pMul0TimesMul1 prod)
 


So simple that one could probably construct a programmable machine in wood and metal for programs such as the above! It is easy to imagine how it might work: simple counters that can always be incremented, but which will stop when decremented to zero. Something to transfer these stops at zero to program jumps. A program perhaps stored on a wheel that rotates from ip to ip. Etc.

It almost makes me want to head down to the workshop after dinner, lol.

-Neil

p.s. The above virtual machine programs may not be minimal. They were the obvious approaches.

p.p.s. Isn't it neat how F# allows the creation of a domain-specific language using arrays that looks almost like source code for a real assembly language program?

Thursday, July 15, 2010

Integer Square Roots

Today's post is short; it shows how to do integer square roots.

This is an old algorithm, once popular in assembly language programs on small processors. The fact that it uses only integers, addition, subtraction, and less-than branching, mean that its hardware and software requirements are at rock bottom. With enough patience and wire, you could probably get it running on a breadboard full of discrete ICs.

With modern floating-point hardware, this algorithm is rarely used, but it can still be handy from time-to-time. (For example, when trying solve the Project Euler problems without calling on external libraries, lol.)

It makes use of the fact that any perfect square can be expressed as the sum of consecutive odd integers.* Below is the F# code and a test; computing the big-O time is left as an exercise to the reader.

(As always, this code is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
// Minimalist square root finder.
let sqrt =
  let rec f i c n =
    match n<i with
    | false -> f (i+2) (c+1) (n-i) 
    | true -> c
  f 1 0
 
// Test.
let x = 
  {0..100} 
  |> Seq.map (fun i->i,sqrt(i)) 
  |> Seq.toList 
 
printfn "Your breakpoint here."


-Neil

*On the outside chance that the reason for this is not immediately obvious, the diagram below should provide a hint:

Tuesday, July 13, 2010

Pure OOP – A Personal Retrospective

(Into which I pack every metaphor I can lay my hands on.)

Although I had been programming for some time, I really started to learn the craft in a serious way during the rise of object-oriented programming. It was a time of turmoil in the programming world. What would be the main language of OOP: C++, Objective-C, Smalltalk, or some variant of another language like FORTH or LISP? What would be its driving idiom: member functions, message-passing? Was OOP really the future, or was it hype? Was the real future something else, like functional programming or logic programming?

And then, we all woke up one day and it was a done deal. The C++ paradigm had taken over, followed by its various brain children in the form of VB, Java, and C#, and its extended family in the form of COM, UML, etc.

And not without reason; OOP has a lot to recommend it. One of those things is the idea of encapsulation. Encapsulation is especially important when building class libraries. Not only does the user of the library not have to know about the details, but now the user cannot even find out about the details. We could keep programmers from accidently shooting off their own feet!

There was also the dream of modularity. Hardware engineers had their standardized components, chips, and modules, and now we could have ours. At last we could be free of the need to constantly handcraft every bit of code and continually reinvent the wheel.

But this safety came at a cost. Not only was there a cost in complexity when writing libraries, but the same encapsulation that circumscribed the safe border also circumscribed the useful border. It harder to misuse things, but it was also harder to tinker with things – the code, in effect, came in tamper-proof packages covered with warnings.

And yet, if history is any guide, tinkering – not safety – is the key to progress and invention. Unsafe environments are part of progress, and big part of becoming an professional is to learn to work safely with unsafe things.

In particular, while calling for re-usability and composability, we failed to realize that there might be an inverse relationship between these goals and the notion of encapsulation. Tinker and pre-fab turned out not to be a match made in heaven. Not that you can’t write composable, encapsulated code, because you certainly can. Rather, that the two goals tend to tug the design in different directions. And if you try to force the code in both directions, as with any increase in two dimensions, the size and complexity of the result end up getting squared. (In the meantime, hardware engineering was flowing in the opposite direction. Tools like VHDL and small, cheap fabrication plants were increasingly allowing them break open their components and customize the insides by composing smaller components.)

But OOP was a really great tool; it took us a long way, and will take us a long way yet. But the very brightness of its flame blinded us to its flaws. As in an apocryphal tale of a golem or Frankenstein, it escaped its natural bounds. But we’ve become so used to it, and are loathe to give up the good things about it, and fearful that a loss of encapsulation may mean a return of primordial chaos.

And so, to help return the genie to its bottle – without destroying both the genie and the bottle – we are now calling on a childhood superhero of computer science: Lambda the Ultimate. And so far, I think, it looks like a good call.

The various functional constructs added to OOP/imperative languages, from functors in C++, to lambdas in C#, to Linq, really are proving to be a way to tame encapsulation bloat without introducing chaos. On the flip side, like most superheroes, Lambda the Ultimate can have trouble adjusting to mortal society. The existing OOP/imperative matrix provides a “secret identity” for Lambda, so it can mix in civilian society without every line of code looking like something from another planet.

And nowhere, I think, is this good combination more apparent than in F#. And that continues to amaze me. Out of habit, nearly every F# project I start begins as a study in pure OOP. At some point, however, I get stuck, do some research online or in a book, and find a case where a really experienced F# programmer has handled a similar problem in a multi-paradigmatic fashion. Almost instantly, my thoughts about the problem undergo a phase change, and I re-write my earlier code to be more succinct, maintainable, and composable, and usually more computationally efficient. And I’m usually able to do this re-write in less time than it would have taken to keep plugging away at the original design.

-Neil

Friday, July 9, 2010

Error Correction for Shunting-Yard Algorithm

Please note that I have found and corrected an error in code for blog entry dated July 3, 2010 and titled "Dijkstra’s Shunting-Yard Algorithm." See that entry for details. My apologies. Perhaps that's what comes of coding on a holiday weekend, lol.

-Neil

Thursday, July 8, 2010

Arity and Unit Arguments in F# 2.0

While working on an internal domain-specific language, I encountered a case where passing a unit into a function made the syntax somewhat more readable. The completely artificial example below illustrates the concept as the function xfer1. I also tried a two-unit version, which is illustrated as xfer2. (Don't get too stuck on the example; I abstracted these artificial illustrations to avoid clutter; it makes more sense in the actual code.)

(As always, this code is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
type X () =
 
  let mutable a = [1;2;3;4;]
  let mutable b = [];
 
  let rec xfer1 () =
    match a with 
    | [] -> ()
    | h::t -> 
      xfer1 (a<-t;b<-h::b)  
 
  let rec xfer2 () () =
    match a with 
    | [] -> ()
    | h::t -> xfer2 (a<-t) (b<-h::b)
 
  member this.Xfer1 () = 
    xfer1()
    this
 
  member this.Xfer2 () = 
    xfer2()()
    this
 
 
let x1 = (X()).Xfer1()
let x2 = (X()).Xfer2()
 
 
printfn "Done."
 


However, I was concerned about what the generated code actually contained, so I decided to have a look at the IL. As it turns out, the two cases, xfer1 vs. xfer2, generate very different code. xfer1 results in a function of no arguments. xfer2, on the other hand, results in a function with two unit arguments.

 
// Signatures:
  internal void xfer1()
  internal void xfer2(Unit unitVar0, Unit unitVar1)
 
// As called:
  this.xfer1();
  this.xfer2(null, null);
 


xfer2, I’m sure, represents the canonical case and xfer1 the exception. This exception is very convenient, and it results in more efficient code. But I’m wondering why it wasn’t extended to include functions of any arity as long as all the arguments are unit? I’m sure this behavior is spelled out in the F# spec, and if I find the section, I’ll update this entry.

-Neil

Saturday, July 3, 2010

Dijkstra’s Shunting-Yard Algorithm

NOTE: OOPS! THE CODE IN THE ORIGINAL VERSION OF THIS POST CONTAINED A BUG. Fortunately, it caused an exception to be thrown so if you used that version it will at least fail and not propagate an error. The corrected function has been substituted below. The error was in the switching logic of the "close" function, which handles close parentheses. The exception was "Mismatched parenthesis."

I like domain-specific languages (DSLs). But one perennial problem when parsing input for a DSL is the need to convert from infix notation into some more amenable kind of order or to an abstract syntax tree. The original syntax is often limited enough in scope that a full-blown Lex/Yacc or other sophisticated parser seems like overkill. Sometimes all that’s needed is a simple tokenization, followed by a basic conversion.

Enter Dijkstra’s shunting-yard algorithm.

Although I had earlier implemented this algorithm in C#, to this point in my F# career, I had not implemented it in that language. It’s not a super complex algorithm, but it is tricky to get right in a new programming paradigm. Below is my first attempt; I’m not sure it’s correct, so take it with a grain of salt and proceed with caution. I did test it using worked examples both of my own and from online, but if there are any obvious errors, I’d be happy hear about it and to post corrections (with attribution and thanks).

I hope to extend it soon to include unary operators.

Without further ado, here is my first F# implementation of Dijkstra’s shunting-yard algorithm, including a basic test (based on the example at Wikipedia). This is probably not a minimal or most-efficient implementation, but it is a starting place.

(As always, this code is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
/// Simple token class.
type Token =
  /// An argument.
  | Arg of string
  /// A function call.
  | Fun of string
  /// Function argument separator.
  | Sep
  /// Operator of tag,precedence,left-assoc
  | Op of string*int*bool
  /// Open-parenthesis.
  | Open
  /// Close-parenthesis.
  | Close
 
 
/// Dijkstra’s shunting-yard algorithm.
/// Convert an infix stream to postfix.
let infixToPostfix =
 
  /// Handle a function separator.
  let rec sep qs qo = 
    match qs with 
    | Open::t -> qs,qo 
    | h::t -> t,(h::qo)
    | [] -> failwith "Syntax error in function call."
 
  /// Handle an operator.
  let rec op qs qo s1 p1 a1 = 
    match qs with 
    | Op(s2,p2,a2)::t -> 
      // Precendence and associativity.
      match p1<p2 with
        // < precedence, either associativity.
      | true -> op t ((Op(s2,p2,a2))::qo) s1 p1 a1
      | _ ->
        match (p2<p1) || a1 with
        | true -> ((Op(s1,p1,a1))::qs),qo
          // = precedence, right-associative.
        | _ -> op t ((Op(s2,p2,a2))::qo) s1 p1 a1
    | _ -> ((Op(s1,p1,a1))::qs),qo
 
  /// Handle close-parenthesis.
  let close qs qo = 
    let rec f qs qo =
      match qs with
      | Open::t -> t,qo
      | h::t -> f t (h::qo)
      | [] -> failwith "Mismatched parenthesis."
    match qs with 
    | [] -> failwith "Mismatched parenthesis."
    | _ ->
      let (qs0,qo0) = f qs qo
      match qs0 with 
      | h::t -> 
        match h with 
        | Fun(s) -> t,h::qo0
        | _ -> qs0,qo0
      | _ -> qs0,qo0
 
  /// Handle remaining operators.
  let rec outputOps qo = function 
    | [] -> qo
    | Open::t | Close::t -> failwith "Mismatched parenthesis."
    | h::t -> outputOps (h::qo) t
 
  /// Uncurried function.
  let rec f (qs,qo) = function
    | Arg(s)::t -> f (qs,((Arg(s))::qo)) t
    | Fun(s)::t -> f (((Fun(s))::qs),qo) t
    | Sep::t -> f (sep qs qo) t
    | Op(s,p,a)::t -> f (op qs qo s p a) t
    | Open::t -> f ((Open::qs),qo) t
    | Close::t -> f (close qs qo) t
    | [] -> outputOps qo qs
 
  // Curried function call.  
  f ([],[]) 
 
 
/// Simple test print.
let rec printqo = function
  | [] -> ()
  | Arg(s)::t | Fun(s)::t | Op(s,_,_)::t -> 
    printf "%s " s
    printqo t
  | _ -> failwith "Syntax error." 
 
// Basic operators.
let plus  = Op("+",10,false)
let minus = Op("-",10,false)
let times = Op("*",11,false)
let div   = Op("/",11,false)
let pwr   = Op("^",12,true)
 
// Test from Wikipedia entry.
// Infix: 3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3
// Postfix: 3 4 2 * 1 5 - 2 3 ^ ^ / +
let x0 = [
  Arg("3");
  plus;
  Arg("4");
  times;
  Arg("2");
  div;
  Open;
  Arg("1");
  minus;
  Arg("5");
  Close;
  pwr;
  Arg("2");
  pwr;
  Arg("3");
  ]
 
// Convert and print.
printqo (infixToPostfix x0 |> List.rev)
 
printfn "Done."

Proof of Concept - End of Phase 1

Phase 1 of my mysterious proof of concept port from C# to F# is complete. And the good news is: the developer downstream has agreed to install the F# 2008 CTP and integrate my F# into the production version! He’s a good guy and always looking to stay with the cutting edge. So that he can continue to use the older data structures, I still have to write a small piece of glue code to do the conversion. (I wish I could say more, or post some code, but this is work I’m doing for someone else, who controls the business/publicity aspects of it.)

Here then, are some general observations after many hours of work and several attempts:

● One of the things I was looking at was the use of general .NET compatibility within an F# program vs. providing the compatibility via conversion at the interfaces.

● For my first attempt, I used general .NET-compatible data structures, objects, properties, and avoided things more specific to F#. That is, I used List<> instead of .NET lists, avoided tuples, used traditional class/file organization instead of a “module/let” organization, etc. I did use F# constructs like tail recursion, but I kept the overall class structure more traditional OO/imperative than functional.

● This “transliteration” approach worked OK. I was able to port the old stuff quickly and it resulted in fairly clean, easy-to-understand code. In particular, the number of lines of code was greatly reduced in comparison with the earlier C#/C++ versions. Where it did break down a bit was in maintenance and enhancement. As I went along, I was becoming more and more comfortable with F# and functional programming, and mixing those new skills and ideas with the OO/imperative stuff was creating tension in the code.

● So I re-wrote it, from scratch, into a more native F# version. I used lists instead of List<>, things like tuples where appropriate, and organized my code into a more F#-like namespace/module/let structure. I didn’t avoid OO/imperative constructs, but rather than allowing them to drive the design, I treated them as tools in the larger F# toolbox.

● Among other things, using F# constructs internally made me re-think the use of various data structures internally vs. at interfaces. I realized that I had spent a lot of time and effort designing hybrid structures that were usable in both capacities. However, these hybrids were ideal for neither. Thinking about an F# specific internal version helped me see those points where decoupling and conversion were superior to a “one-size-fits-all” approach.

● The result of all this “F#-ing” is code that is much cleaner and easier to maintain. There are even fewer lines of code than the already reduced number in the transliteration version. One key breakthrough was the use of computation expressions and other domain-specific language techniques. This really helped keep the code readable, composable, and extensible. The multi-paradigm nature of F# allowed me to mix declarative and imperative styles in a way that created simpler code than either style alone. Like oil and vinegar, the blend was more palatable than either alone.

● And if I ever do need to take it back to C#, the ideas I got from the F# port will make leave the code simpler and more extensible than the original. F#, I feel, is an excellent language for exploring ideas and designs, even if the final version ends up in some other language.

I’m so impressed, that I plan to port other portions of the project to F#, at least on a trial basis.

-Neil

Friday, July 2, 2010

Project Euler, Problem 2 Generalized

Project Euler, problem 2 involves the Fibonacci sequence. Specifically: find the sum of the even-valued Fibonacci numbers which do not exceed four million.

One way to tackle this is to generate all the Fibonacci numbers and check for divisibility while accumulating. However a look at the Fibonacci sequence might reveal a better way:*



There is obviously a pattern: every third number (beginning with 0) is divisible by 2. Furthermore, every fourth number is divisible by 3, every fifth number is divisible by 5, every sixth number is divisible by 8 etc. In fact, this pattern of divisibility continues for all the Fibonacci numbers. (The proof is left as an exercise to the reader.)

This suggests that we can create formulae which generate sequences of these divisible numbers. It is easiest to do this by generating pairs of numbers; e.g. 2,3;8;13;34,55 for the case of 2, etc. If we label the divisible Fibonacci number E, and the next Fibonacci N (“E” for “even” in the original problem and “N” for “next”), we get the following formulae. It is easy to see how the Fibonacci sequence has been smuggled in as the coefficients of the functions.

2: E+2N, 2E+3N
3: 2E+3N, 3E+5N
5: 3E+5N, 5E+8N
Etc.

So we can use these coefficients to construct a generator for a particular subset of the Fibonacci numbers. For example here is a generator for the sequence in the original problem. (As always, this code is presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
let euler2 n = 
  let rec fib e o =
    seq {
      yield e
      yield! (fib (e+o*2) (e*2+o*3)) }
  fib 0 1
  |> Seq.takeWhile ((>) n)
  |> Seq.sum
 
let x = euler2 4000000       
 


This is obviously easy to generalize, based on the coefficients:

 
let euler2WithCoefficients (c0,c1) = 
  let rec fib e o =
    seq {
      yield e
      yield! (fib (c0*e+c1*o) (c1*e+(c0+c1)*o)) }
  fib 0 1
 


And it can be further generalized, based on an internal generalization of the coefficients (e.g. the Fibonacci sequence – I used a simple, stack-based fib to make the code simpler). Here is a solution to the original problem using this last generalization.

 
let euler2General n = 
  let rec fib = function
    | 0 -> (0,1)
    | n -> 
      let f,s = fib (n-1)
      s,(f+s)
  if n<=0 then failwith "n must be greater than zero."
  let c0,c1 = (fib (n-1))
  let rec fibseq e o =
    seq {
      yield e
      yield! (fibseq (c0*e+c1*o) (c1*e+(c0+c1)*o)) }
  fibseq 0 1
 
 
let euler2Solution = 
  euler2General 3 // Remember, fib #3, not div by 3!
  |> Seq.takeWhile ((>) 4000000) 
  |> Seq.sum 
 


And of course, this is a generalization based on offsets in the Fibonacci sequence; it can only be used for divisibility by numbers actually in the Fibonacci sequence.

So how well does this meet with my ideal of “fluent” code, or “code so obviously like the problem and solution that domain expert can understand it”? Well, it is more complicated and difficult than Project Euler problem 1, but a lot of the complexity above comes from the formula which yields Fibonacci-divisible numbers. A straightforward solution to the specifics of problem 2, testing for divisibility, is simpler. But when you think about it, analysis like that above is part of the knowledge base of most domains, so experts may already be more familiar with those analyses than with naïve solutions.

For more on the Euler problems in F# (and other interesting stuff), I commend you to the ongoing series and comments at Mark Pearl's blog.

-Neil

*I chose to begin the Fibonacci sequence at 0,1 and to begin the indexing at 0. I just find this simpler Plus, it has the appealing set of coincidences that Fibonacci 1 is 1, 5 is 5, 10 is 55, and 12 is 144!. The ideas delineated above also work with other foundations and indexing sequences if the appropriate adjustments are made.

Continuation-Passing Mnemonics

Learning to construct continuations can be tough. It’s easy to state how they work: they facilitate tail recursion by recursively passing the remainder of a computation. But it can be a lot harder to actually get one working from scratch. I won’t discuss continuations further here; there are lots of good discussions online and in books. What I want to show here is a simple mnemonic block of code I use to remember how to write a continuation. It can help prevent that “deer in the headlights” look when asked for a continuation example by a friend, at a job interview, or when staring into the monitor at 1 a.m., lol.

Most of the examples one sees involve using a continuation to provide tail recursion to a function requiring some operation on the results from two recursive calls. It’s probably because this covers the many of the simplest real life cases for which continuation-passing is needed. For example, here is the commonly-seen continuation-passing version of a function which computes a Fibonacci number. (As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

 
let fib n =
  let rec f n cont =
    match n with
    | 0 -> cont 0
    | 1 -> cont 1
    | n -> 
      f (n-2) (fun n2-> 
        f (n-1) (fun n1->
          cont(n1+n2)))
  f n (fun x->x)
 


But even this simple example can be a little intimidating, and a little difficult to remember, given that writing continuation-passing functions is not typically a frequent task. It is worthwhile committing the Fibonacci example to memory, but I’ve got an even simpler example that I can even use as a mnemonic to recall the Fibonacci (or tree, etc.) examples.

Here is my simpler example: computing the sum of the numbers from 1 to n. Of course, this a task for which continuation-passing, and even recursion, is not actually required; I use it as an example only because it is so simple to remember!

For reference, here is a naïve recursive implementation:

 
let rec sum0 = function
  | 1 -> 1
  | n -> n+(sum0 (n-1))
 


This one shows a typically-constructed continuation-passing version:

 
let sum1 n =
  let rec f n cont =
    match n with
    | 1 -> cont 1
    | n -> f (n-1) (fun n1->cont(n+n1))
  f n (fun x->x)   
 


And this one, the one I memorize, shows a continuation-passing version with the parameters in reverse order from that usually seen. For some reason I find it easier to understand this way; probably some psychological thing related to the recursive and continuation functions being side-by-side. (That, and the currying and use of the “function” construct make it look cleaner.)

 
let sum2 =
  let rec f cont = function
    | 1 -> cont 1
    | n -> f (fun n1->cont(n+n1)) (n-1) 
  f (fun x->x)   
 


Tests will show that the naïve version will overflow the stack past a certain value of n, while the other two do not. (As always when testing tail recursion, remember that tail call optimization is turned off in debug mode, so perform the tests in release mode.)

Remembering this simple example can make it easier to construct the more complex cases where continuation-passing is actually needed. In my ongoing parlance, this version, being simpler, appears more “fluent.” And it’s often easier when coding to first write a simple function or program, and then change it into a complex one incrementally, even if the initial, simple example bears little resemblance to the end result.

-Neil