Friday, June 11, 2010

Easy, No-Compile DFAs

The methods for compiling a set of finite automata DFAs or NFAs into a composite DFA are found in most introductory compiler texts. But how easy is it to construct a simple engine in F# which recognizes a series of patterns? As it turns out, pretty simple. The code bellow is an experiment I did in this regard. (It’s not intended to be the be-all, end-all of state machine engines – it’s mostly just a bit of practice in F#.)

The code below demonstrates a machine with can recognize strings of objects and generate tags when various strings are recognized. It basically creates lists of “mini-machines” capable of recognizing individual strings. These are then run in parallel using the List functions. The test code shows it working with strings of digits, but it is parameterized to work with any class which supports equality comparison.

For example, one might add machines capable of recognizing 123,345, and 567. In that case the string of digits 123456123 would match 123 two times, 345 once, and 567 not at all.

I went overboard in using various List functions, because one of the purposes of this exercise was to become more used to doing things “the F# way.” If I were doing this for production, I might use tail recursive functions in some places. I also pass around more data than is strictly required, in the interests of playing around with some ideas about composability.

(As an exercise to the reader, it should be noted that it is possible to do away with the Dictionary<> entirely and substitute a List function, albeit with a loss of efficiency.)

As always, presented "as-is" and without warranty or implied fitness of any kind; use at your own risk. This is something I did really fast, for fun, so there may be any number of bugs and inefficiencies.

open System.Collections.Generic 
// A pattern is a tag and a list of data.
type Pattern<'t,'d> = 't*('d list)
// Maps two functions onto a 2-tuple.
let internal mapTuples f g (a,b) = (f a),(g b)
// Test a datum and update the tag list and the
// list of running machines.
let test (datum:'d) 
         (l0:Pattern<'t,'d> list) = 
  let headMatches (datum:'d) 
                  ((_,data):Pattern<'t,'d>) =
  let emptyDataTail ((_,data):Pattern<'t,'d>) =
  let tailPattern ((tag,data):Pattern<'t,'d>) =
  l0 |> (List.filter (headMatches datum)) 
     |> (List.partition emptyDataTail)
     |> mapTuples ( fst) 
                  ( tailPattern) 
// Start a machine if required.
// Machines with strings of length 1 are also
// recognized here.
let start (patterns:Dictionary<'d,Pattern<'t,'d> list>) 
          ((tags,running):'t list*Pattern<'t,'d> list) =
  let dataIsEmpty ((_,data):Pattern<'t,'d>) =
  let addTag (tags:'t list) ((tag,_):Pattern<'t,'d>) =
  let addPattern (patterns:Pattern<'t,'d> list) (p:Pattern<'t,'d>) =
  match patterns.TryGetValue datum with
    | (false,_) -> (tags,running)
    | (_,pl) -> 
      pl |> List.partition dataIsEmpty
         |> mapTuples (List.fold addTag tags) 
                      (List.fold addPattern running) 
// Run one cycle of the machine.
let cycle (patterns:Dictionary<'d,Pattern<'t,'d> list>) 
          ((tags,running):'t list*Pattern<'t,'d> list)
          (datum:'d) =
  running |> test datum 
          |> start patterns datum 
// Helper function to add a machine.
let add (patterns:Dictionary<'d,Pattern<'t,'d> list>) 
        (data:'d list) =
  match data with 
  | [] -> () 
  | h::t ->
    patterns.[h] <-
        match patterns.TryGetValue h with
        | (false,_) -> []
        | (true,pl) -> pl)
// Test functions.
// Test.
let testDatum (patterns:Dictionary<'d,Pattern<'t,'d> list>) 
              (rtn:'t list*Pattern<'t,'d> list)
              (datum:'d) = 
  let rtn0 = cycle patterns rtn datum
  List.iter (printf "%A ") (fst rtn0)
  printfn "" 
let patterns = new Dictionary<int,Pattern<string,int> list>()
add patterns "0(0)" [0]
add patterns "0(1)" [0]
add patterns "0124" [0;1;2;4]
add patterns "01"   [0;1]
add patterns "013"  [0;1;3]
add patterns "132"  [1;3;2]
let rtn = Seq.fold (testDatum patterns) ([],[]) [0;1;0;1;3;2;4]
printfn "Done."

No comments: