Baby steps with factor: a csvparser
I got round to writing my first factor module tonight: a csv parser.
Actually there's already a csv parser written in factor by Daniel Ehrenberg, but it's been removed from the latest factor releases. I found a copy of that code here but I don't know for how long it'll stay there.
Unfortunately I had two problems with this existing module: The first was that it ran pretty slowly (1M of csv took ~5 seconds to parse on my laptop) mainly because my copy of factor wouldn't compile the state-parser module that it depends on so it ran un-optimized. The second was that I needed a parser that could parse a row at a time for reading huge csv files in chunks. I took that as an opportunity to write my own.
The code performs ok (~500ms for 1M csv) and parses all the examples on the wikipedia csv page, but I can't help feeling that I've written it in a similar style to what I would have done if I were using scheme. If anybody has any hints on ways to make the code smaller or faster or more elegant then I'd be delighted.
USING: kernel sequences io namespaces combinators ;
IN: csvparser
DEFER: quoted-field
: not-quoted-field ( -- endchar )
",\"\n\s\t" read-until #! "
dup
{ { CHAR: \s [ drop % not-quoted-field ] } ! skip whitespace
{ CHAR: \t [ drop % not-quoted-field ] }
{ CHAR: , [ swap % ] }
{ CHAR: " [ drop drop quoted-field ] } ! "
{ CHAR: \n [ swap % ] }
{ f [ swap % ] } ! eof
} case ;
: maybe-escaped-quote ( -- endchar )
read1
dup
{ { CHAR: " [ , quoted-field ] } ! " is an escaped quote
{ CHAR: \s [ drop not-quoted-field ] }
{ CHAR: \t [ drop not-quoted-field ] }
[ drop ]
} case ;
: quoted-field ( -- endchar )
"\"" read-until ! "
drop % maybe-escaped-quote ;
: field ( -- string sep )
[ not-quoted-field ] "" make swap ;
: (row) ( -- sep )
field swap ,
dup CHAR: , = [ drop (row) ] when ;
: row ( -- array[string] eof? )
[ (row) ] { } make swap ;
: (csv) ( -- )
row swap , [ (csv) ] when ;
: csv-row ( stream -- row )
[ row drop ] with-stream ;
: csv ( stream -- rows )
[ [ (csv) ] { } make ] with-stream ;
If anybody's interested the module (inc tests and doc) is here.