Monday, May 31, 2010

List<>.collect Implementations with Benchmarks

I had a situation come up in my proof-of-concept where I needed to do a collect on a .NET List<>. So I did some experiments with how it might best be accomplished. A key factor was that this is something that may be done literally millions of times a day if the application makes it to deployment; so I was concerned with performance. I came up with several versions that I benchmarked. All assume an “open System.Collections.Generic.”

As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.

This is the most straightforward; it’s how you would immediately think of doing it in an imperative style:

let listCollect0<'a,'b> (f:'a->List<'b>)
                        (la:List<'a>) : List<'b> =
  let lb = new List<'b>()
  for a in la do
    for b in (f a) do
      lb.Add b
  lb

These two versions are probably the most straightforward functional style:

let listCollect1<'a,'b> (f:'a->List<'b>)
                        (la:List<'a>) : List<'b> =
  let lb = new List<'b>()
  Seq.iter (fun a->Seq.iter lb.Add (f a)) la
  lb
 
 
let listCollect2<'a,'b> (f:'a->List<'b>)
                        (la:List<'a>) : List<'b> =
  let lb = new List<'b>()
  Seq.iter lb.Add (Seq.collect f la)
  lb

This version is the same as one of the above functional styles, but with some “hints” to help with optimization:

let listCollect3<'a,'b> (f:'a->List<'b>)
                        (la:List<'a>) : List<'b> =
  let fAdd (l:List<'b>) (a:'a) = Seq.iter l.Add (f a)
  let lb = new List<'b>()
  let fAddb = fAdd lb
  Seq.iter fAddb la
  lb

This version uses the fold function to manage the returned List<’b>:

let listCollect4<'a,'b> (f:'a->List<'b>)
                        (la:List<'a>) : List<'b> =
  Seq.fold 
    (fun (l:List<'b>) (b:'b) -> l.Add b; l) 
    (new List<'b>()) 
    (Seq.collect f la) 

These versions mirror 1 and 2, but use sequences until the last step.

let listCollect5<'a,'b> (f:'a->seq<'b>)
                        (la:List<'a>) : List<'b> =
  let lb = new List<'b>()
  Seq.iter (fun a->Seq.iter lb.Add (f a)) la
  lb
 
 
let listCollect6<'a,'b> (f:'a->seq<'b>)
                        (la:seq<'a>) : List<'b> =
  let lb = new List<'b>()
  Seq.iter lb.Add (Seq.collect f la)
  lb

And here are the benchmarks in %. (I left out the test code; it’s not particularly relevant. Suffice to say all were compiled and run in F# 2, release mode, using tests cleverly designed to foil any optimization.)

listCollect0 = 100%
listCollect1 = 210%
listCollect2 = 169%
listCollect3 = 214%
listCollect4 = 205%
listCollect5 = 434%
listCollect6 = 546%


It turns out that the naïve version worked the best by a pretty big margin. The straightforward functional versions (including the one with “hints”) were second. The ones which mixed lists and sequences were the slowest. What does this imply?

First: it says that if performance is an issue, you simply have to run benchmarks. Furthermore, these benchmarks should include at least one version which is designed to be as simple and direct as possible.

Second: although the straightforward functional versions were slower than the naïve version, they were still plenty fast enough for applications where iteration counts may be in the thousands rather than the millions. They might be preferable in the former instances because they provide a familiar idiom for anyone maintaining the code.

Third: attempts to make the code overly general may impair performance. Stick to the level of abstraction you need, neither more nor less. This is where F# has a real advantage, since it is a language that rewards quick experimentation and the local customization of code snippets.

Bottom line: I’ll probably use the naïve version (listCollect0) in my application. If I come across a similar situation which requires lower performance, I’ll probably use something more like listCollect1 or listCollect2.

-Neil

Wednesday, May 26, 2010

Computation Expressions with .NET Data Types

I’m still working on my proof-of-concept and it’s going well. One thing that’s been on my mind is the desire to use computation expressions in it somewhere. So I’ve been trying to make sure I understood computation expressions well enough to use them to good purpose.

To smooth the integration, the code will have to play nice with existing .NET components written in C#, etc. To this end, I am sometimes using .NET data structures – like List<> – in places where a pure F# implementation might use list, etc. So I decided to practice with some computation expressions that mingle in these other .NET data structures.

Here is a simple example. It splits a string into a tree using various characters and string.Split. It actually does its job using side-effects on the original tree node (Neil dodges rotten tomatoes thrown by the functional programming purists.) For example, splitting the string “a.b:c.d” by colon first and then by dot would result in this tree:



And here is the code. (As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

open System.Collections.Generic
open System.Text.RegularExpressions
 
 
// This is a simple tree node
// which uses List<>.
// Think of it as a proxy for
// some real-world problem.
type Tree = 
  { Text : string
    mutable Children : List<Tree> option }
  // Syntactic sugar.
  static member Make (s) = 
    { Text=s; Children=None }
 
 
// Splits the text in the node
// using string.split and
// uses it to create child nodes.
let splitter (c:char) =
  let a = [|c|]
  let f (t:Tree) =
    t.Children <- Some(new List<Tree>())
    Seq.iter 
      (fun s->t.Children.Value.Add(Tree.Make(s))) 
      (t.Text.Split(a,System.StringSplitOptions.None))
    t
  f
 
 
// This is almost like a collect,
// except it works through the 
// side-effects in the splitter function.
type TreeCollector () =
 
  member this.Bind ((lt:Tree),
                    (f:Tree->Tree)) =
    Seq.iter (fun t->(f t)|>ignore) lt.Children.Value
    lt
 
  member this.Return v = v
 
 
// Some tests.
 
let splitBar   = splitter '|'
let splitColon = splitter ':'
let splitDot   = splitter '.'
 
let treeCollector = TreeCollector()
 
// Build a tree by bar, colon, then dot.
let f0 t = treeCollector {
  let! x = splitBar t
  let! x = splitColon x
  let! x = splitDot x
  return x
}
 
// Build a tree by colon, bar, then dot.
let f1 t = treeCollector {
  let! x = splitColon t
  let! x = splitBar x
  let! x = splitDot x
  return x
}
 
 
// Simple test print.
let printo (t:Tree) =
  let rec f s (t:Tree) =
    printfn "%s %s" s t.Text 
    if t.Children.IsSome then
      Seq.iter (f (s+"  ")) t.Children.Value
  f "" t
 
 
let t0 = Tree.Make("a0.a1:b0.b1|c0.c1:d0.d1")
let t1 = Tree.Make(t0.Text)
 
f0 t0 |> ignore
f1 t1 |> ignore
 
printo t0
printfn ""
printo t1
 
printfn "Done."

Monday, May 17, 2010

More Active Pattern Activity

Here are some more samples I came up with while practicing with active patterns. Again, I'm not sure I would ever implement any of these in particular; there may be better ways. But still, I could see where a couple of them would be handy, and my main purpose was to become comfortable with using active patterns.

Forgive my unexciting examples, it was late on a Sunday night, lol.

(As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)
// Explicit empty; just for fun.
let inline (|Empty|NonEmpty|) (s:string) =
  match System.String.IsNullOrEmpty(s) with
  | true -> Empty
  | _ -> NonEmpty
 
 
// Return Some() if s starts with p, else None.
let inline (|StartsWith|_|) (p:string) (s:string) =
  match s with
  | Empty -> None
  | _ ->
    match s.StartsWith(p) with
    | true -> Some()
    | _ -> None
 
 
// Return Some(p,s') if s starts with p, else None,
// where s' is the remainder of s.
let inline (|BeginsWith|_|) (p:string) (s:string) =
  match s with
  | Empty -> None
  | _ ->
    match s.StartsWith(p) with
    | true -> Some((p,s.Substring(p.Length)))
    | _ -> None
 
 
// Return (p,s') if s starts with p, else ("",s),
// where s' is the remainder of s.
let inline (|SplitStart|) (p:string) (s:string) =
  match s with
  | Empty -> ("",s)
  | _ ->
    match s.StartsWith(p) with
    | true -> (p,s.Substring(p.Length))
    | _ -> ("",s)
 
 
// Return index if s contains p, else None,
let inline (|IndexOf|_|) (p:string) (s:string) =
  match s with
  | Empty -> None
  | _ ->
    match s.IndexOf(p) with
    | -1 -> None
    | i -> Some(i)
 
 
let test0 (s:string) =
  match s with
  | StartsWith "zero" -> "zero"
  | StartsWith "one" -> "one"
  | _ -> ""
 
let test1 (s:string) =
  match s with
  | BeginsWith "zero" (h,t) -> t+h
  | BeginsWith "five" (h,t) -> t+h
  | _ -> s
 
let test2 (s:string) =
  match s with
  | SplitStart "zero" (h,t) -> t+h
 
let test3 (s:string) =
  match s with
  | IndexOf "zero" (i) -> s.Substring(i)
  | IndexOf "seven" (i) -> s.Substring(i)
  | _ -> ""
 
 
let tests = [
  test0;
  test1;
  test2;
  test3 ]
 
let strings = [
  "zero two";
  "one two seven four";
  "five two";
  "three two" ]
 
let result =
  List.map 
    (fun t -> List.map t strings)
    tests 
 
printf "Done."

Sunday, May 16, 2010

Active Pattern Activity

Wow! It's been a while. But I haven't been idle with respect to F#. Far from it; I’m still keeping busy with my proof of concept port and some real-world maintenance/enhancement to the original C#.

It turns out that some of the requested enhancements in the C# version just happened to coincide with my next porting task for my proof-of-concept. So I got to review those parts of the C# version “for free.” What timing!

I have to say the F# proof of concept is still going a lot better than I had ever hoped. Not only is it easy to port the basic algorithms from C# where applicable, but the kind of straightforward design that “functional-thinking” encourages is making for a better program with much less source code.

But for now, here’s a small thing I did to help learn more about active patterns. It uses the some of the ideas I got from Ashley Feniello’s Scheme tokenizer, while working with List types rather than F# lists.

Let me say up front that this is not how I would solve most pattern-matching problems involving List<>. There are better solutions. However, this does show how active patterns could be used to provide analogs of various built-in pattern-matching design patterns. So look at how the idea works with List<> vis-à-vis F# lists below, but imagine it in situations where it might be more useful.

And, as always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

open System.Collections.Generic
 
 
// Explicit empty.
let inline (|Empty|NonEmpty|) ((l:List<'a>),i) =
  match (l.Count-i)<=0 with
  | true -> Empty
  | _ -> NonEmpty
 
// Extract the first item, or None on empty.
// (Avoids creating unused tuple on h::_.)
let inline (|Head|_|) ((l:List<'a>),i) =
  match (l.Count-i)<1 with
  | true -> None
  | _ -> Some(l.[i])
 
// Extract the first item and the tail, 
// or None on empty.
let inline (|HeadTail|_|) ((l:List<'a>),i) =
  match (l.Count-i)<1 with
  | true -> None
  | _ -> Some(l.[i],(l,i+1))
 
// Extract the first  items and the tail, 
// or None on less than two items.
let inline (|HeadHeadTail|_|) ((l:List<'a>),i) =
  match (l.Count-i)<2 with
  | true -> None
  | _ -> Some(l.[i],l.[i+1],(l,i+2))
 
 
type Token =
  | Open
  | Close
  | Symbol of string
 
let tokenize (listIn:List<char>) =
  let rec symbol s listIn = 
    match listIn with
    | HeadTail(' ',t)
    | HeadTail('(',t) 
    | HeadTail(')',t) -> Symbol(s),listIn
    | HeadTail(c,t) -> symbol (s+(c.ToString())) t
    | _ -> Symbol(s),listIn
  let rec tokenize' (tokens:Token list) = function
    | HeadTail(' ',t) -> tokenize' tokens t
    | HeadTail('(',t) -> tokenize' (Open::tokens) t
    | HeadTail(')',t) -> tokenize' (Close::tokens) t
    | HeadTail(c,t) -> 
      let s,t' = symbol (c.ToString()) t
      tokenize' (s::tokens) t'
    | _ -> tokens
  List.rev (tokenize' [] (listIn,0))
 
 
let l0 = List<char>()
 
for c in ("(this (is a) test)") do
  l0.Add(c)
 
let tokens = tokenize l0
 
printf "Done."
 

Saturday, May 8, 2010

Tiny Expert System using Monads

“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.”
– Abraham Maslow

“If your only problem is a nail, every tool tends to look like a hammer.”
– Rainer L. Price

I’ve been wanting to learn more about monads and their F# implementation as computation expressions. Inspired by the contagious enthusiasm in this video, last night I started reading chapter 12 in Real World Functional Programming by Tomas Petricek and Jon Skeet.

About halfway through, it occurred to me: “You could write a simple expert system using monads!” One of my stated goals for this blog is to use various “toy” A.I. and computer science problems to learn F#. I’ve strayed from that a bit, so this is a bit of a return.

Below is a tiny expert system implemented using computation expressions. I managed to find a way to use the following computation expression methods: Bind, Delay, and Return. I don’t claim that this is a good way to build an expert system; it’s cumbersome, lacks features like memoization, and probably isn’t even the best way to do it using computation expressions. But it was a nice baby step towards overcoming my unfamiliarity with monads.

(And, as always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

Here are the computation expression classes. There is one for conjunctions (and-type logic) and one for disjunctions (or-type) logic. The most important features of each are the ability to short circuit when a threshold (th) in the certainty is reached, and to delay computation. Conjunction short-circuits when certainty falls below the threshold, while Disjunction short-circuits when certainty exceeds the threshold. Coupled with Delay for lazy evaluation, this permits operation based on something more rational that exhaustive search. (For convenience, the classes also encapsulate the respective probability operators on the range 0-100 as static members.)

type Conjoin (th) =
  static member And a b =
    (a*b+49)/100
  member this.Bind(a,f) =
    if a<th 
      then a
      else f a
  member this.Delay f =
    fun () -> f()
  member this.Return v = v
 
 
type Disjoin (th) =
  static member Or a b =
    (10000-((100-a)*(100-b))+49)/100
  member this.Bind(a,f) =
    if a>th 
      then a
      else f a
  member this.Delay f =
    fun () -> f()
  member this.Return v = v

And here is a simple knowledge base. In case there is still any confusion about the characteristics of certain animals, I have again chosen that as my domain. The conclusions are reported by side effect.

let conjoin = Conjoin(20)
let disjoin = Disjoin(80)
 
 
let white = 80
let orange = 40
let black = 100
let striped = 35
let spotted = 100
 
let whiteAndBlack = 
  conjoin {
    let! x0 = white
    let! x1 = Conjoin.And x0 black
    return x1 }
 
let orangeAndBlack = 
  conjoin {
    let! x0 = orange
    let! x1 = Conjoin.And x0 black
    return x1 }
 
let tiger = 
  conjoin {
    let! x0 = orangeAndBlack()
    let! x1 = Conjoin.And x0 striped
    printfn "tiger %i" x1
    return x1 }
 
let zebra = 
  conjoin {
    let! x0 = whiteAndBlack()
    let! x1 = Conjoin.And x0 striped
    printfn "zebra %i" x1
    return x1 }
 
let leopard = 
  conjoin {
    let! x0 = orangeAndBlack()
    let! x1 = Conjoin.And x0 spotted
    printfn "leopard %i" x1
    return x1 }
 
let animal = 
  disjoin {
    let! x0 = tiger()
    let! x1 = Disjoin.Or x0 (zebra())
    let! x2 = Disjoin.Or x1 (leopard())
    return x2 }
 
 
animal() |> ignore
 
System.Console.ReadLine() |> ignore

When the code is run, the result is as follows.
zebra 28
leopard 40

It’s not exactly fifth-generation A.I., but it was a fun way to reinforce my reading about computation expressions. And it shows Rainer was right, if your only problem is a nail, even a monad can look like a hammer.

-Neil

p.s. I’m trying a new source-code formatter which preserves syntax highlighting. If you cut and paste the code into Visual Studio, you’ll see that the line breaks have been removed. This can usually be solved by an intermediate paste and cut into a text editor such as Word or WordPad.

Thursday, May 6, 2010

Is F# the new COBOL?

Mark Pearl asks an interesting question in his blog: “Does F# kill C++?” It’s well worth pondering the questions he has posted.

A similar question came up in a phone conversation I had yesterday. (In part inspired by Erik Meijer’s wearing a t-shirt that reads “got COBOL?” in this video.)

“Is F# the new COBOL?”

(At this point, I imagine a collective F# groan of “No, please don’t make that comparison!” Lol. But when you read “COBOL,” think in terms of what the acronym stands for: COmmon Business-Oriented Language.)

I don’t do a lot of business programming, though did learn COBOL in grad school (for my foreign language requirement, of all things, lol). But from the buzz on the web, F# seems to be attracting a lot of financial and business types. I would imagine that a lot of business programming has splintered off into a welter of languages and systems, but I guess a lot of places probably still actually use COBOL.

So what factors might help F# become the new COBOL – the new grand unified business language? Even as something of an outsider to business programming, I can think of a few:

1) Business programmers, and even business people who program as an adjunct, are much more computer literate than were their counterparts in the heyday of COBOL. Most of them have had experience of multiple computer languages, they have home computers, they work day-to-day with sophisticated software, and they probably learned and daily use more quantitative techniques than did their business ancestors.

2) Most of the mundane chores of business computing are now well-established and handled by turn-key systems. However, with Xml, SQL, etc., the ability to interoperate with these systems has never been greater.

3) Data access and manipulation technologies like LINQ make possible the kind of “natural” (to the domain) syntax COBOL was originally designed to provide.

4) F# more naturally interfaces with modern technologies such as networks and GUIs than do older languages.

5) F# simplifies the creation of new domain specific languages (DSLs). So now business programmers can craft novel and more task-specific DSLs, rather than relying on the single DSL that comprises much of COBOL or on DSLs provided in vendor products.

6) But most important, as I mentioned above, is that financial and other business programmers actually do seem to be among the most enthusiastic early adopters of F#. And that, in the end, may be the only factor that really matters.

So, is F# the new COBOL? Time will tell; “got F#?”

-Neil

Wednesday, May 5, 2010

A Half-Baked Scheme

First, I have to apologize for a few inefficiencies in the previous post’s code. They appear to be artifacts of the iterative design process. F# experts will already have noted them and marked me down as a tyro. I’ll try to correct them and flag them.

I found them because today’s code continues the theme of stack-based processing of S-expressions. One problem with showing data only examples, as in the last post, is that it’s tough to make up a good sample. Real world uses tend to be too complex and would obscure the interesting S-expression stuff.

So the example below uses a dictionary to do apply rudimentary function binding. The S-expression is processed as before, right to left, using a stack, but the evaluation works by attempting to find an apply a function bound to each sub-expression.

I’ll show the code in several blocks, which may be copied and pasted together to run. First, here is the basic tokenizer, modified so that it only recognizes delimiters and symbols (i.e. string of non-whitespace, non-delimiter characters).

(And, as always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)
open System
open System.Collections.Generic


// Tokenizer based on the work of Ashley Feniello.
// See post of 2010.01.15 at:
// http://blogs.msdn.com/ashleyf/default.aspx

// This tokenizer recognizes only delimiters "()"
// and symbols. One could add strings, numbers, etc.
type Token =
| Open
| Close
| Symbol of string


let tokenize source =
let rec symbol (a:string) l =
match l with
| (')'::_) as t -> a, t
| w::t when Char.IsWhiteSpace(w) -> a, t
| [] -> a, []
| h::t -> symbol (a+(h.ToString())) t
let rec tokenize' a = function
| w::t when Char.IsWhiteSpace(w) -> tokenize' a t
| '('::t -> tokenize' (Open::a) t
| ')'::t -> tokenize' (Close::a) t
| h::t ->
let n,t' = symbol (h.ToString()) t
tokenize' (Symbol(n)::a) t'
| [] -> a
tokenize' [] source

And here is the evaluator, consisting of two sub-functions. One of these functions, “eval’,” is a straightforward recursive evaluation of the input token stack, similar to that of earlier posts. The other function, “apply,” is triggered by the “Open” token. It tries to resolve the binding for the first symbol in each S-expression, and applies it to the working token stack. The bound function may manipulate the stack further, and eventually returns the unused portion of the stack along with anything it has pushed onto the stack.
// Stack-based evaluator.
let eval (find:string->bool*(Token list->Token list))
tokens =
// Find and apply a function bound to the symbol
// on the top of the stack.
let apply = function
| [] -> failwith "Stack underflow."
| Close::t -> t
| Symbol(s)::t ->
match find s with
| (true,f) -> f t
| _ -> failwith "Unrecognized function."
| _ -> failwith "Syntax error."
// Recursive evaluator. Runs until the input token
// stack (e.g. list) is empty, returning the evaluated
// stack (e.g list) as the result.
let rec eval' stack = function
| [] -> stack
| Open::t -> eval' (apply stack) t
| Close::t -> eval' (Close::stack) t
| Symbol(sym)::t -> eval' (Symbol(sym)::stack) t
// Some applications may need to pass in an initial stack.
// Here it is [] for convenience.
eval' [] tokens


The examples show two function bindings. One, “+,” is a simple symbol concatenation function. The other, “countThis,” is a function that counts the tail elements of an S-expression. The first example shows a concatenation, while the second example uses concatenation to produce a symbol which should bind to “countThis.”
// This will bind symbols to functions.
// Note that the function takes a stack as a list
// and returns a stack as a list.
let functions = new Dictionary<string,Token list->Token list>()

// This simple symbol concatenation function
// shows the template of a function.
// 1) It should fail on an empty stack.
// 2) On Close, it should return the unused
// portion of the stack along with any
// computed Tokens. Returned values
// must be wrapped in tokens.
// The Close must be consumed and no
// portion of the stack below the Close
// should be used.
// 3) Multiple symbols may be consumed
// recursively.
// 4) Anything else fails.
let rec symbolConcat a = function
| [] -> failwith "Stack underflow."
| Close::t -> Symbol(a)::t
| Symbol(sym)::t -> symbolConcat (a+sym) t
| _ -> failwith "Syntax error."

functions.Add("+",(symbolConcat ""))


// This stack counting function exists just
// to show how returned tokens can be used
// as function indices.
let rec countThis (a:int) = function
| [] -> Symbol(a.ToString())::[]
| Close::t -> (Symbol(a.ToString()))::t
| Symbol(sym)::t -> countThis (a+1) t
| _ -> failwith "Syntax error."

functions.Add("countThis",(countThis 0))


// String concatenation.
let tokenList0 =
tokenize
(List.ofSeq "(+ if you can (+ read this (+ you are)) (+ too close))")

// Should be: Symbol("ifyoucanreadthisyouaretooclose")
let evalStack0 = eval (functions.TryGetValue) tokenList0

// Concatenated string used as a function index.
let tokenList1 = tokenize (List.ofSeq "((+ count This) a b c d)")
// Should be: Symbol("4")
let evalStack1 = eval (functions.TryGetValue) tokenList1

// This fails with "Unrecognized function."
//let tokenList2 = tokenize (List.ofSeq "((+ count That) a b c d)")
//let evalStack2 = eval functions tokenList2

printfn "Your breakpoint here."

So when to use and not use this approach?

Use this approach if you need to do straightforward processing of S-expressions, and generally if that processing occurs only once. One example might be the processing of S-expressions into data structures that are not well represented by expression trees. Another example might be as a domain specific language (DSL) for application configuration or serialization (in those rare cases where Xml or some other standard method is not appropriate).

Do not favor this approach for more complex uses of S-expressions. For example, where there are lots of arbitrary value or function bindings. Also, note that lazy evaluation of the type used by conditionals is also very difficult using this method. In those cases, the code could quickly get messier than simply starting off with a richer, more tradition S-expression system such as Ashley Feniello’s Scheme in F#.

-Neil

Tuesday, May 4, 2010

S-Expressions for Semantic Networks

Continuing from the last post, this post shows how to use a simple tokenizer and evaluator to parse an S-expression (e.g. Lisp- or Scheme-like syntax) into a pared-down semantic network. This also builds on some of my earlier posts regarding using F# itself as a DSL for specifying semantic networks.

I apologize for the rudimentary nature of the semantic network code, I wanted to get something just complex enough to demonstrate the principles. It does at least provide a single, simple interface function: “Node.Link,” which could also front a much more sophisticated set of data structures.

Why do it this way instead of using F# itself as a DSL as I did earlier? Nothing definitive, it’s a judgment call really. If the F# compiler is readily available and the use-cases support the use of F# source, that’s still a good option. However, sometimes one needs data to be data, and compiled code to be compiled code, and in that case, some kind of processing of the data is required.

Why do it this way instead of say, Xml? Mostly readability. Some things will be more readable and more easily expressed using Xml. Other things will be more readable and more easily expressed using S-expressions, others by using custom infix operators, etc. Again, it’s a judgment call.

As before, please refer to Ashley Feniello’s excellent “Code Monkey Have Fun” Scheme REPL code for a more complete example of S-expressions in F#.

I wrote this fairly quickly between more mundane matters like work and life (lol); beware of bugs and typos. As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.

open System
open System.Collections.Generic

// This is a super-basic semantic network
// node designed only for demonstration.
// I apologize for making the node dictionary
// a static; I wanted to keep this really simple.
// In real life there would be separate graph container,
// probably passed into the evaluator below.
[<System.Diagnostics.DebuggerDisplay("{tag}")>]
type Node (tag:string) =

static let nodes = new Dictionary<string,Node>()

static let tryAddNode tag =
match nodes.TryGetValue tag with
| true,n -> n
| _ ->
let n = Node(tag)
nodes.Add(tag,n)
n

let mutable linksTo:(Node*string) list = []
let mutable linksFrom:(string*Node) list = []

static member Nodes with get() = nodes

static member Link tagFrom tagLink tagTo =
let nodeFrom = tryAddNode tagFrom
let nodeTo = tryAddNode tagTo
nodeFrom.AddLinkFrom tagLink nodeTo
nodeTo.AddLinkTo tagLink nodeFrom

member this.Tag with get() = tag

member this.AddLinkTo l n =
linksTo<-(n,l)::linksTo

member this.AddLinkFrom l n =
linksFrom<-(l,n)::linksFrom


// This little tokenizer is a gross simplification of
// Ashley Feniello's Scheme tokenizer posted 2010.01.15 at
// http://blogs.msdn.com/ashleyf/default.aspx
// See earlier posts for more about Ashley's excellent blog

type Token =
| Open
| Close
| Symbol of string


let tokenize source =
let rec symbol (a:string) l =
match l with
| (')'::_) as t -> a, t
| w::t when Char.IsWhiteSpace(w) -> a, t
| [] -> a, []
| h::t -> symbol (a+(h.ToString())) t
| _ -> failwith "Unexpected character."
let rec tokenize' a = function
| w::t when Char.IsWhiteSpace(w) -> tokenize' a t
| '('::t -> tokenize' (Open::a) t
| ')'::t -> tokenize' (Close::a) t
| h::t ->
let n,t' = symbol (h.ToString()) t
tokenize' (Symbol(n)::a) t'
| [] -> a
| _ -> failwith "Unexpected character."
tokenize' [] source


// The evaluator generates the network into
// the Node static dictionary.
let eval l =
let rec fLink nFrom link = function
| Symbol(nTo)::t ->
Node.Link nFrom link nTo
fLink nFrom link t
| Close::t -> Symbol(nFrom)::t
| _ -> failwith "Syntax error."
let fFrom = function
| Symbol(nFrom)::Symbol(link)::t -> fLink nFrom link t
| _ -> failwith "Syntax error."
let rec eval' a = function
| [] -> a
| Open::t ->
let a' = fFrom a
eval' a' t
| Close::t -> eval' (Close::a) t
| Symbol(s)::t -> eval' (Symbol(s)::a) t
eval' [] l


// Breakpoint and examine these.
let tokenList =
tokenize
(List.ofSeq "((Man isA Mammal Primate) hasA Name)")

let evalList = eval tokenList // [Symbol("Man")]

let nodes = Node.Nodes

printfn "All done."

System.Console.ReadLine() |> ignore

Monday, May 3, 2010

A Simple DSL: Tokenization to Evaluation

The purpose of this post is twofold: first, to present a bit of code. But second, and perhaps more important, it’s to give a shout out to one of my favorite F# blogs, Ashley Feniello’s “Code Monkey Have Fun”.

In particular, his series on Scheme in F#. It’s full of good code and wisdom, and well worth reading even if you’re not interested in the Scheme language per se.

Not being glib with compiler terminology, I’m not sure how to describe today’s post, but here goes:

A lot of times, I find myself writing domain specific languages (DSLs) and file formats that are too complex for simple lines of data, but too simple to require full-blown parsers, parse trees, and compilers. So I usually end up with something that can go directly from tokenization to evaluation. The evaluation usually happens once, creating some data structure that I’m actually going to use at runtime. Another feature I’ve found common to such situations is that the DSL is conveniently specified using a delimited prefix syntax (e.g. like LISP or Scheme), but is easily evaluated using a stack-based interpreter (e.g. like FORTH).

This post presents a small example of the above process. It implements a small interpreter which analyzes and evaluates LISP-like prefix expressions involving integers, multiplication, and division. For example: (+ (* 1 2) (* 3 4) evaluates to a data element containing the number 14. To make clear what is happening, be sure and examine the value of tokenList before it is evaluated.

This simple example is not particularly useful; in actual use, both the expression content and the result would usually be something more complex. For example, the expression might represent some behavior of an NPC in a video game, and the evaluated content would be the data structures necessary to implement that behavior as the game is running.

In any event, I started with Ashley Feniello’s Scheme tokenizer, modifying it to produce a postfix list of tokens representing the prefix expression, and then a stack-based evaluator to do the calculation. I simplified and modified Ashley’s tokenizer extensively, and – given that the original is very good – most likely not for the better; I take full responsibility for any bugs or bad idioms that may have crept in.

I never cease to be amazed at how much one can accomplish with so little in F#.

This weekend my basement participated, in it’s own small way, in the flooding in the eastern central United States. The basement is now dry! (or at least merely damp), but I fear my concentration may have suffered from a near-sleepless night spent helping bail water and baby-sit the sump-pump. So…

As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.
open System

// This demonstrates how to combine a simple tokenizer
// with a simple stack-based evaluator. That is,
// it goes directly from tokenization to evaluation
// without building a parse tree.
// This technique is suitable for some
// domain-specific languages.

// This little tokenizer is a gross simplification of
// Ashley Feniello's Scheme tokenizer posted 2010.01.15 at
// http://blogs.msdn.com/ashleyf/default.aspx
//
// My version recognizes only:
// whitespace, '(', ')', '+', '*', and integers.
//
// Unlike Ashley's tokenizer, mine returns the token list
// reversed, since that is what my evaluator expects.
//
// Please read Ashley Feniello's entire series on
// Scheme in F#! Trust me, you'll be glad you did.
// It demonstrates a more complete Scheme
// implementation in F#, that could
// be used either as an embedded Scheme evaluator
// or modified to implement other
// domain-specific languages

type Token =
| Plus // Add.
| Times // Multiply.
| N of int // A number.
| Stop // Stack segment delimiter.
// Note: delimiter is not needed if arity is fixed.


let tokenize source =
let rec number a l =
match l with
| (')'::_) as t -> a, t
| w::t when Char.IsWhiteSpace(w) -> a, t
| [] -> a, []
| h::t when Char.IsDigit(h) ->
number ((a*10)+(Int32.Parse(h.ToString()))) t
| _ -> failwith "Unexpected character."
let rec tokenize' a = function
| w::t when Char.IsWhiteSpace(w) -> tokenize' a t
| '('::t -> tokenize' a t
| ')'::t -> tokenize' (Stop::a) t
| '+'::t -> tokenize' (Plus::a) t
| '*'::t -> tokenize' (Times::a) t
| h::t when Char.IsDigit(h) ->
let n,t' = number (Int32.Parse(h.ToString())) t
tokenize' (N(n)::a) t'
| [] -> a
| _ -> failwith "Unexpected character."
tokenize' [] source


// A simple stack-based evaluator.
// In a real application this might evaluate
// a domain-specific language to compute some
// value or generate data.

let eval l =
let rec func (f:int->int->int) a = function
| [] -> failwith "Unexpected end."
| h::t ->
match h with
| N(n) -> func f (f a n) t
| Stop -> N(a)::t
| _ -> failwith "Unexpected token."
let rec eval' a = function
| [] -> a
| h::t ->
match h with
| Plus -> eval' (func (+) 0 a) t
| Times -> eval' (func (*) 1 a) t
| _ -> eval' (h::a) t
eval' [] l

let tokenList = tokenize (List.ofSeq "(+ (*1 2) (* 3 4))")
let evalList = eval tokenList // Should be [ N(14) ].

printfn "All done."

System.Console.ReadLine() |> ignore