Wednesday, May 5, 2010

A Half-Baked Scheme

First, I have to apologize for a few inefficiencies in the previous post’s code. They appear to be artifacts of the iterative design process. F# experts will already have noted them and marked me down as a tyro. I’ll try to correct them and flag them.

I found them because today’s code continues the theme of stack-based processing of S-expressions. One problem with showing data only examples, as in the last post, is that it’s tough to make up a good sample. Real world uses tend to be too complex and would obscure the interesting S-expression stuff.

So the example below uses a dictionary to do apply rudimentary function binding. The S-expression is processed as before, right to left, using a stack, but the evaluation works by attempting to find an apply a function bound to each sub-expression.

I’ll show the code in several blocks, which may be copied and pasted together to run. First, here is the basic tokenizer, modified so that it only recognizes delimiters and symbols (i.e. string of non-whitespace, non-delimiter characters).

(And, as always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)
open System
open System.Collections.Generic


// Tokenizer based on the work of Ashley Feniello.
// See post of 2010.01.15 at:
// http://blogs.msdn.com/ashleyf/default.aspx

// This tokenizer recognizes only delimiters "()"
// and symbols. One could add strings, numbers, etc.
type Token =
| Open
| Close
| Symbol of string


let tokenize source =
let rec symbol (a:string) l =
match l with
| (')'::_) as t -> a, t
| w::t when Char.IsWhiteSpace(w) -> a, t
| [] -> a, []
| h::t -> symbol (a+(h.ToString())) t
let rec tokenize' a = function
| w::t when Char.IsWhiteSpace(w) -> tokenize' a t
| '('::t -> tokenize' (Open::a) t
| ')'::t -> tokenize' (Close::a) t
| h::t ->
let n,t' = symbol (h.ToString()) t
tokenize' (Symbol(n)::a) t'
| [] -> a
tokenize' [] source

And here is the evaluator, consisting of two sub-functions. One of these functions, “eval’,” is a straightforward recursive evaluation of the input token stack, similar to that of earlier posts. The other function, “apply,” is triggered by the “Open” token. It tries to resolve the binding for the first symbol in each S-expression, and applies it to the working token stack. The bound function may manipulate the stack further, and eventually returns the unused portion of the stack along with anything it has pushed onto the stack.
// Stack-based evaluator.
let eval (find:string->bool*(Token list->Token list))
tokens =
// Find and apply a function bound to the symbol
// on the top of the stack.
let apply = function
| [] -> failwith "Stack underflow."
| Close::t -> t
| Symbol(s)::t ->
match find s with
| (true,f) -> f t
| _ -> failwith "Unrecognized function."
| _ -> failwith "Syntax error."
// Recursive evaluator. Runs until the input token
// stack (e.g. list) is empty, returning the evaluated
// stack (e.g list) as the result.
let rec eval' stack = function
| [] -> stack
| Open::t -> eval' (apply stack) t
| Close::t -> eval' (Close::stack) t
| Symbol(sym)::t -> eval' (Symbol(sym)::stack) t
// Some applications may need to pass in an initial stack.
// Here it is [] for convenience.
eval' [] tokens


The examples show two function bindings. One, “+,” is a simple symbol concatenation function. The other, “countThis,” is a function that counts the tail elements of an S-expression. The first example shows a concatenation, while the second example uses concatenation to produce a symbol which should bind to “countThis.”
// This will bind symbols to functions.
// Note that the function takes a stack as a list
// and returns a stack as a list.
let functions = new Dictionary<string,Token list->Token list>()

// This simple symbol concatenation function
// shows the template of a function.
// 1) It should fail on an empty stack.
// 2) On Close, it should return the unused
// portion of the stack along with any
// computed Tokens. Returned values
// must be wrapped in tokens.
// The Close must be consumed and no
// portion of the stack below the Close
// should be used.
// 3) Multiple symbols may be consumed
// recursively.
// 4) Anything else fails.
let rec symbolConcat a = function
| [] -> failwith "Stack underflow."
| Close::t -> Symbol(a)::t
| Symbol(sym)::t -> symbolConcat (a+sym) t
| _ -> failwith "Syntax error."

functions.Add("+",(symbolConcat ""))


// This stack counting function exists just
// to show how returned tokens can be used
// as function indices.
let rec countThis (a:int) = function
| [] -> Symbol(a.ToString())::[]
| Close::t -> (Symbol(a.ToString()))::t
| Symbol(sym)::t -> countThis (a+1) t
| _ -> failwith "Syntax error."

functions.Add("countThis",(countThis 0))


// String concatenation.
let tokenList0 =
tokenize
(List.ofSeq "(+ if you can (+ read this (+ you are)) (+ too close))")

// Should be: Symbol("ifyoucanreadthisyouaretooclose")
let evalStack0 = eval (functions.TryGetValue) tokenList0

// Concatenated string used as a function index.
let tokenList1 = tokenize (List.ofSeq "((+ count This) a b c d)")
// Should be: Symbol("4")
let evalStack1 = eval (functions.TryGetValue) tokenList1

// This fails with "Unrecognized function."
//let tokenList2 = tokenize (List.ofSeq "((+ count That) a b c d)")
//let evalStack2 = eval functions tokenList2

printfn "Your breakpoint here."

So when to use and not use this approach?

Use this approach if you need to do straightforward processing of S-expressions, and generally if that processing occurs only once. One example might be the processing of S-expressions into data structures that are not well represented by expression trees. Another example might be as a domain specific language (DSL) for application configuration or serialization (in those rare cases where Xml or some other standard method is not appropriate).

Do not favor this approach for more complex uses of S-expressions. For example, where there are lots of arbitrary value or function bindings. Also, note that lazy evaluation of the type used by conditionals is also very difficult using this method. In those cases, the code could quickly get messier than simply starting off with a richer, more tradition S-expression system such as Ashley Feniello’s Scheme in F#.

-Neil

No comments: