Saturday, February 26, 2011

Significant Whitespace, DSLs, and You (in F#)

Today’s post shows some simple code for teasing out the structure of a text file based on its indentation.

A trend in computer science is for computer languages to rely more on significant whitespace. Of course, that is true of F#, but Python has introduced the concept to an even wider audience. Significant whitespace and structure are also important in the design of domain-specific languages, since these constructs help reduce visual clutter.

But it pays not to be too strict. For example, forcing indentation and other structures to be at specific tab stops is likely to be a source of irritation as well as readability. What is needed is an adaptable system where – if the structure naively looks right, it gets interpreted right by the parser.

The code below presents on part of a toolkit for doing that. It takes input a line at a time, and decides whether each new line is a child at new level of indentation, a sibling of the current level, or represents a “pop” upwards to a parent level. The exact indentation at each level is unimportant, and is set by the first sibling. Pops back upwards must be to a previously encountered level.

Of course, a lot of things are missing. For example, in some languages, individual line indentation is not sufficient; some knowledge of context is required. And the code below is not really packaged very well; in real use, some sort of encapsulation in a class might be more appropriate. And as with any code, it could be refactored and perhaps simplified in a number of ways.

But it shows one basic technique. (As always, the code and information here are presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

(FWIW, the test sample is some actual Python code from a “toy” expert system I wrote to help me learn Python.)

open System
/// Rudimentary indentation processor.
module IndentOMatic =
  // These three functions stand in for 
  // real-world functionality.
  /// Add a sibling item.
  let add s i =       
    printfn ""
    printf "Add : %i %s%s " i (String('-',i)) s
  /// Pop a level.
  let pop () = printf "^"
  /// Push a child item.
  let push s i =
    printfn ""
    printf "Push: %i %s%s " i (String('-',i)) s
  /// Convert a string to a string*int,
  /// where the string is the string minus
  /// leading spaces and the int is the 
  /// number of spaces.
  let private stringToTag (s:string) =
    let rec leading l i (s:string) =
      match i>=l with
      | true -> i
      | _ ->
        match s.Chars(i) with
        | ' ' -> leading l (i+1) s
        | _ -> i
    let i = leading s.Length 0 s
  /// Process a string. 
  let rec private proc0 (nl:int list) s i = 
    /// Pop the stack, looking for a parent
    /// or sibling.
    let rec scanUp = function
      | [] -> failwith "Stack underflow."
      | h::t ->
      match i=h with
      | true -> h::t
      | _ ->
      match i>h with
      | true -> failwith "No matching unindent."
      | _ -> 
        scanUp t 
    match nl with
    | [] -> failwith "Stack underflow."
    | h::t ->
    match i=h with
    | true -> 
      add s i
    | _ -> 
      match i>h with
      | true -> 
        push s i
      | false -> proc0 (scanUp nl) s i
  /// Call to process a string. 
  let proc nl s =
    let t,i = stringToTag s
    proc0 nl t i
// Test it.
let test = 
    // Some Python code from a "toy" 
    // expert system.
    "class Not():";
    "    def __init__(self,rule):";
    "        self.rule = rule";
    "    def prove(self,tryProve):";
    "        p = self.rule.prove(tryProve)";
    "        if p==None:";
    "            return p";
    "        else:";
    "            return not p";
let x = Seq.fold IndentOMatic.proc [-1] test
printfn ""
printfn ""
printfn "Your breakpoint here."

The output looks something like this:

Push: 0 class Not():
Push: 4 ----def __init__(self,rule):
Push: 8 --------self.rule = rule ^
Add : 4 ----def prove(self,tryProve):
Push: 8 --------p = self.rule.prove(tryProve)
Add : 8 --------if p==None:
Push: 12 ------------return p ^
Add : 8 --------else:
Push: 12 ------------return not p
Your breakpoint here.

Friday, February 18, 2011

Oops! Correction to Post Dated August 8, 2010

Oops! I found a couple of bugs in the code listed in the blog post titled: Segment Tree in F#. The bugs are in the "findCollect" and "findIter" functions, and they're really rather glaringly obvious (essentially different versions of the same bug). It looks like a “my-brain-was-out-to-lunch” error as I was transcribing the algorithm either from my head or from pseudo-code. In any case, it can be caught by a simple unit test or even code review, so I’m embarrassed that it got into the blog. My humblest apologies.

In order to avoid repeating the error, I’m thoroughly testing my correction, and I’ll post it when that testing is done. I’ll also post a better test example (that would have revealed this error).

In the meantime, the fix is quite simple, so readers may want to try figuring it out on their own. (Hint: try searching for values completely outside the range of the tree. This will force the bug to occur.)


Monday, February 7, 2011

My Dinner with Haskell

It’s impossible to go very far into either the F# world or the world of domain-specific languages (DSLs) without encountering Haskell. It seems like all the F# gurus are also Haskell gurus, or at least Haskell aficionados, and a lot of people use Haskell for prototyping new languages (including DSLs). Any search for materials on F# or DSLs likewise leads to information on Haskell.

So I decided to learn me some Haskell.

The first thing I did was to get a working build of the GHC (Glasgow Haskell Compiler) by following the appropriate links at the official Haskell site. Next, I found some learning materials, including Real-World Haskell, Learn You a Haskell for Great Good, Haskell Wikibook, Planet Haskell, and Erik Meijer's excellent Channel 9 series. (The latter is worth watching for the shirts alone!)

And then, a couple of days ago, I got to work learning. And so far…

…so good. This is a really fun language, and fairly easy to learn if you already have some experience in F#. At least, knowing F# gets you past the initial strangeness of concepts like currying, pattern matching, etc. In fact, knowing F# is a tiny bit of an impediment because the syntax is so similar, the differences are difficult to remember. But on balance, knowing F# has helped a bunch. The two languages aren’t exactly siblings, but they are cousins, and I think learning the more formal environment of Haskell will help me with the more pragmatic environment of F#.

In order to illustrate the two together, below I present “A tiny corner of Lisp…” in both F# and in Haskell. This being a little fragment of Lisp functionality coded as identically as possible in the two languages. As always, the code and information here are presented "as-is" and without warranty or implied fitness of any kind; use at your own risk. (And since I don’t yet have anything that can syntax highlight Haskell, I deliberately did not syntax highlight the F#.)

// A tiny corner of Lisp in F#.

type Node =
| Atom of string
| Cons of Node*Node
| Nil

let car = function
| Cons(h,_) -> h
| _ -> failwith "Invalid 'car' target."

let cdr = function
| Cons(_,t) -> t
| _ -> failwith "Invalid 'cdr' target."

let rec append nCar nCdr =
match nCar with
| Cons(h,t) -> Cons(h, append t nCdr)
| Nil -> nCdr
| _ -> failwith "Invalid 'append' target(s)."

// Examples.

let a0 = Atom "a" // a
let c0 = Cons(a0,(Cons((Atom "b"),Nil))) // (a b)
let c1 = append c0 c0 // (a b a b)
let c2 = append c1 a0 // (a b a b . a)

-- A tiny corner of Lisp in Haskell.

data Node = Atom String
| Cons Node Node
| Nil
deriving (Show)

car :: Node -> Node
car (Cons h _) = h
car _ = error "Invalid 'car' target."

cdr :: Node -> Node
cdr (Cons _ t) = t
cdr _ = error "Invalid 'cdr' target."

append :: Node -> Node -> Node
append (Cons h t) nCdr = (Cons h (append t nCdr))
append Nil nCdr = nCdr
append _ _ = error "Invalid 'append' target(s)."

-- Examples.

a0 = Atom "a" -- a
c0 = Cons a0 (Cons (Atom "b") Nil) -- (a b)
c1 = append c0 c0 -- (a b a b)
c2 = append c1 a0 -- (a b a b . a)

So get out there and learn you a Haskell for great good!