Saturday, February 26, 2011

Significant Whitespace, DSLs, and You (in F#)

Today’s post shows some simple code for teasing out the structure of a text file based on its indentation.

A trend in computer science is for computer languages to rely more on significant whitespace. Of course, that is true of F#, but Python has introduced the concept to an even wider audience. Significant whitespace and structure are also important in the design of domain-specific languages, since these constructs help reduce visual clutter.

But it pays not to be too strict. For example, forcing indentation and other structures to be at specific tab stops is likely to be a source of irritation as well as readability. What is needed is an adaptable system where – if the structure naively looks right, it gets interpreted right by the parser.

The code below presents on part of a toolkit for doing that. It takes input a line at a time, and decides whether each new line is a child at new level of indentation, a sibling of the current level, or represents a “pop” upwards to a parent level. The exact indentation at each level is unimportant, and is set by the first sibling. Pops back upwards must be to a previously encountered level.

Of course, a lot of things are missing. For example, in some languages, individual line indentation is not sufficient; some knowledge of context is required. And the code below is not really packaged very well; in real use, some sort of encapsulation in a class might be more appropriate. And as with any code, it could be refactored and perhaps simplified in a number of ways.

But it shows one basic technique. (As always, the code and information here are presented "as-is" and without warranty or implied fitness of any kind; use at your own risk.)

(FWIW, the test sample is some actual Python code from a “toy” expert system I wrote to help me learn Python.)

open System
/// Rudimentary indentation processor.
module IndentOMatic =
  // These three functions stand in for 
  // real-world functionality.
  /// Add a sibling item.
  let add s i =       
    printfn ""
    printf "Add : %i %s%s " i (String('-',i)) s
  /// Pop a level.
  let pop () = printf "^"
  /// Push a child item.
  let push s i =
    printfn ""
    printf "Push: %i %s%s " i (String('-',i)) s
  /// Convert a string to a string*int,
  /// where the string is the string minus
  /// leading spaces and the int is the 
  /// number of spaces.
  let private stringToTag (s:string) =
    let rec leading l i (s:string) =
      match i>=l with
      | true -> i
      | _ ->
        match s.Chars(i) with
        | ' ' -> leading l (i+1) s
        | _ -> i
    let i = leading s.Length 0 s
  /// Process a string. 
  let rec private proc0 (nl:int list) s i = 
    /// Pop the stack, looking for a parent
    /// or sibling.
    let rec scanUp = function
      | [] -> failwith "Stack underflow."
      | h::t ->
      match i=h with
      | true -> h::t
      | _ ->
      match i>h with
      | true -> failwith "No matching unindent."
      | _ -> 
        scanUp t 
    match nl with
    | [] -> failwith "Stack underflow."
    | h::t ->
    match i=h with
    | true -> 
      add s i
    | _ -> 
      match i>h with
      | true -> 
        push s i
      | false -> proc0 (scanUp nl) s i
  /// Call to process a string. 
  let proc nl s =
    let t,i = stringToTag s
    proc0 nl t i
// Test it.
let test = 
    // Some Python code from a "toy" 
    // expert system.
    "class Not():";
    "    def __init__(self,rule):";
    "        self.rule = rule";
    "    def prove(self,tryProve):";
    "        p = self.rule.prove(tryProve)";
    "        if p==None:";
    "            return p";
    "        else:";
    "            return not p";
let x = Seq.fold IndentOMatic.proc [-1] test
printfn ""
printfn ""
printfn "Your breakpoint here."

The output looks something like this:

Push: 0 class Not():
Push: 4 ----def __init__(self,rule):
Push: 8 --------self.rule = rule ^
Add : 4 ----def prove(self,tryProve):
Push: 8 --------p = self.rule.prove(tryProve)
Add : 8 --------if p==None:
Push: 12 ------------return p ^
Add : 8 --------else:
Push: 12 ------------return not p
Your breakpoint here.


Dan Finch said...

Awesome! I did something similar a while back to parse an indentation-based DSL. (code)

TechNeilogy said...

Thanks for sharing that, Dan! The secret to DSLs, I'm learning, is a robust bag of simple tools and snippets.

Dan Finch said...

It's amazing how much easier to understand your functional parser is than mine (ported from C#).

TechNeilogy said...

It's a result of fear. What I fear more than anything is maintaining my own code, so I go to great lengths to make it understandable to myself, lol.