More factor: tabular to triples

I've been playing with Factor for a couple of weeks. I'm finding that it takes me quite a bit longer to write stuff with factor than with other languages, but the process is enjoyable and I get the feeling that I'm learning something useful each time. The question is: will programming speed improve with experience. And more to the point, eventually, will I be able to write code faster in factor than in gambit scheme?

Here's a task I've been trying to accomplish the past couple of evenings: converting tabular data into triples. I've written this functionality before in both python and scheme before so it's a good exercise for developing factor experience.

First, here's the code in python:


def row_to_triples(i,cols,row):
    return [ (i,col,cell) for col,cell in zip(cols,row) ]

def tabular_to_triples(startid,cols,rows):
    triples = []
    for i,row in zip(range(startid,startid+len(rows)),rows):
        triples += row_to_triples(i, cols, row)
    return triples

Which works like this:


>> print tabular_to_triples(0, 
                ["col1","col2","col3"],
                [["a","b","c"],["e","f","g"]])
[(0, 'col1', 'a'), (0, 'col2', 'b'), (0, 'col3', 'c'), (1, 'col1', 'e'), (1, 'col2', 'f'), (1, 'col3', 'g')]

Here’s the equivalent in gambit scheme using srfi-42 eager comprehensions:


(include "srfi-42.scm")

(define (tabular->triples startid rows cols) 
  (define (row->triples id cols row)
    (list-ec (:parallel (:list i cols) (:list j row))
             (list id i j)))

  (append-ec (:list row (index i) rows)
             (row->triples (+ i startid) cols row)))

Which yields:


>> (tabular->triples 0 
                     '((a b c)(d e f))
                     '(col1 col2 col3))

((0 col1 a) (0 col2 b) (0 col3 c) (1 col1 d) (1 col2 e) (1 col3 f))

ASIDE: If you’re evaluating gambit scheme (or scheme in general) be sure to check out srfi-42 and also Alex Shinn’s ml-style pattern matching module from http://synthcode.com/scheme/ . Gambit scheme out-of-the-box is like abstraction assembly language - you need tools layered over the top for real world use.

Ok, so on to factor. My first attempt was:


: row>triples ( cols row rowid -- triples )
  [ -rot 3array ] curry 2map ;

: process-row ( rowid cols row -- rowid cols triples )
  rot 1 + swapd                           ! inc rownumber
  [ swapd row>triples ] 2keep swap rot ;

: tabular>triples ( start-rowid cols rows -- triples )
  [ process-row ] map concat 2nip ;

Which works like this:<pre>>> { "c1" "c2" "c3" } { { "a" "b" "c" } { "d" "e" "f" } } 0 tabular>triples >> . { { 1 "c1" "a" } { 1 "c2" "b" } { 1 "c3" "c" } { 2 "c1" "d" } { 2 "c2" "e" } { 2 "c3" "f" } }</pre>

This is reasonably compact, but looks horrible to me and is pretty difficult to follow due to all the stack shuffling. I’m starting to learn that the way to eliminate stack shuffling is to use combinators, and try to factor out the common code.

I did a bit of functional de-composition with the hope that creating words for the pieces would yield clearer code. The pieces of functionality needed were:

creating a triple from 3 elements on the stack. Handled by 3array
enumerating a variable whilst mapping through a sequence
holding a variable whilst mapping through a sequence. Handled by ‘curry map’
mapping two sequences in parallel (columns and rows). Handled by 2map
concatenating sequences (rows) of triples together: concat

Actually once I’d worked out the which words I wanted I found that a most of them already existed. In fact all of them except ‘enumerating a variable whilst mapping through a sequence’, so I wrote ‘map-with-counter’ to provide this. It prepends enumerating code to the map quotation before calling map, then cleans up the index variable at the end:


: map-with-counter ( start seq quot -- newseq )
 [ [ dup 1+ swap ] dip ] swap compose map nip ;

And so few attempts later and I've got:


: row>triples ( rowid cols row -- triples )
  [ 3array ] curry** 2map ;

: tabular>triples ( start-rowid cols rows -- triples )
  [ row>triples ] curry* map-with-counter concat ;

Which I'm quite pleased with. However in addition to map-with-counter I've had to create a new partial application combinator: curry, which is general but doesn't exist in the standard library. A sure sign that I'm doing something wrong, or at least differently to other factor developers. And while I was writing curry I ended up creating same new stack shufflers:


: rotd ( a b c d -- b c a d )  >r rot r> ;
: -rotd ( a b c d -- c a b d ) >r -rot r> ;
: dupdd ( a b c -- a a b c ) >r dupd r> ;

! partial application of quot based on 3rd item of stack
! see curry, curry* 
: curry** ( param obj obj quot -- obj obj curry )
  rotd [ -rotd call ] 2curry ; inline

The underlying theme to these extra words is dealing with the 3rd element in the stack and below. So does that mean that seasoned factor developers tend to switch to some other mechanism when there's more than 3 stack variables in play? Sounds like a question for the mailing list...