I got round to writing my first factor module tonight: a csv parser.

Actually there's already a csv parser written in factor by Daniel Ehrenberg, but it's been removed from the latest factor releases. I found a copy of that code here but I don't know for how long it'll stay there.

Unfortunately I had two problems with this existing module: The first was that it ran pretty slowly (1M of csv took ~5 seconds to parse on my laptop) mainly because my copy of factor wouldn't compile the state-parser module that it depends on so it ran un-optimized. The second was that I needed a parser that could parse a row at a time for reading huge csv files in chunks. I took that as an opportunity to write my own.

The code performs ok (~500ms for 1M csv) and parses all the examples on the wikipedia csv page, but I can't help feeling that I've written it in a similar style to what I would have done if I were using scheme. If anybody has any hints on ways to make the code smaller or faster or more elegant then I'd be delighted.


USING: kernel sequences io namespaces combinators ;
IN: csvparser

DEFER: quoted-field

: not-quoted-field ( -- endchar )
  ",\"\n\s\t" read-until   #! "
  dup
  { { CHAR: \s  [ drop % not-quoted-field ] } ! skip whitespace
    { CHAR: \t  [ drop % not-quoted-field ] } 
    { CHAR: ,   [ swap % ] } 
    { CHAR: "   [ drop drop quoted-field ] }  ! " 
    { CHAR: \n  [ swap % ] }    
    { f         [ swap % ] }       ! eof
  } case ;

: maybe-escaped-quote ( -- endchar )
  read1 
  dup
  { { CHAR: "   [ , quoted-field ] }     ! " is an escaped quote
    { CHAR: \s  [ drop not-quoted-field ] } 
    { CHAR: \t  [ drop not-quoted-field ] } 
    [ drop ]
  } case ;

: quoted-field ( -- endchar )
  "\"" read-until                                 ! "
  drop % maybe-escaped-quote ;

: field ( -- string sep )
  [ not-quoted-field ] "" make swap ;

: (row) ( -- sep )
  field swap , 
  dup CHAR: , = [ drop (row) ] when ;

: row ( -- array[string] eof? )
  [ (row) ] { } make swap ;

: (csv) ( -- )
  row swap , [ (csv) ] when ;

: csv-row ( stream -- row )
  [ row drop ] with-stream ;

: csv ( stream -- rows )
  [ [ (csv) ] { } make ] with-stream ;

If anybody's interested the module (inc tests and doc) is here.