And another new programming language

Last year I embarked on somwhat of a journey to find a better language for my home projects after getting a bit frustrated by python's lack of blocks and general cruftyness. After a couple of months of trying various different things I settled on Gambit Scheme for my spare-time data indexing project. A minimal core language with uniform syntax and macros. Lots of potentual for adapting and building language features that make sense to me. Sorted.

Then last week I got round to reading Richard Jones's minimal forth code. A language compiler and runtime in a couple of pages of well documented X86 assembler; small enough to read and understand the whole system in a bus journey.

Like Scheme, Forth has the ability to construct new language features, allowing user code to get between the parser and the evaluator to modify the language itself. Also, like Scheme, it has a uniform syntax - words separated by spaces, which makes re-writing code on the fly practical. Both these features mean that a fully fledged programming environment can be bootstrapped up from a minimal core. And the hook is: that minimal core is so much smaller for Forth than it is for Scheme.

One thing lead to another and I got interested in stack languages. Now I'm looking at Factor and wondering...

Factor has ticks in all the right boxes: minimal core, machine code compiler, macros, continuations, lightweight threads and message passing concurrency. It's also the tersest language I've seen - code to do something always seems a fraction of the size I expect it to be. On the other hand it's so radically different to anything else I've programmed in and it makes my head hurt. Could it be a contender to knock Scheme off the top spot?

A poor man’s scheme profiler

Gambit scheme lacks a profiler that can profile scheme with embedded C code. (There's statprof, but unfortunately it doesn't profile embeded C). I needed to do this pretty desperately for my triple indexing stuff so I've written a simple macro which takes wall clock timings of functions and accumulates them.

You replace 'define' with 'define-timed' in the functions you want profiled, and then the time spent in each function is accumulated in a global hash table. It's not pretty, but it's simple.

The macro needs some supporting code (which I keep in a seperate module so that it can be compiled in order to minimise overhead). :


(define *timerhash* (make-table))

;;; call this before running the code
(define (reset-timer) (set! *timerhash* (make-table)))


;;; adds the time spent in the thunk to an entry in the hashtable
(define (accum-time name thunk)
  (let* ((timebefore (real-time))
         (res (thunk)))
    (table-set! *timerhash* name 
                (let ((current (table-ref *timerhash* name '(0 0))))
                  (list (+ (car current) (- (real-time) timebefore))
                        (+ (cadr current) 1))))
    res))

;;; call this afterwards to get the times
(define (get-times)
  (map (lambda (e) (list (first e) 
                    (* 1000 (second e))
                    (third e)))

       (sort-list (table->list *timerhash*)
                  (lambda (a b) (> (second a)
                              (second b))))))

And here's the macro. It basically just wraps the function body in a call to 'accum-time':


(define-syntax define-timed
  (syntax-rules ()
    ((define-timed (name . args) body ...)
     (define (name . args)
       (accum-time 'name (lambda ()
                           body ...))))))

Here's some example output. The first number is the accumulated milliseconds spent in the function, and the second is the number of times it was called.


((handle-query-clause 2182.3294162750244 8)
 (do-substring-search-query 1678.9898872375488 1)
 (run-query 1678.9379119873047 1)
 (main 1678.929090499878 1)
 (handle-substr-search 705.5349349975586 2)
 (convert-ids-to-symbols 400.3608226776123 1)
 (handle-s_o-clause 192.81506538391113 2)
 (lookup-o->s 192.52514839172363 2)
 (lookup-2-lvl-index 192.48294830322266 2)
 (join 163.8491153717041 8)
 (hash-join 163.6500358581543 5)
 (fetch-int32-results 118.37220191955566 655)
 (project 100.7072925567627 3)
 (concat 77.42834091186523 11793)
 (handle-p->so-clause 55.348873138427734 2)
 (bounds-matching-1intersect 51.54275894165039 2)
 (lookup-p->2-lvl-index 50.98700523376465 2)
 (search-32-32-index 42.65856742858887 94)
 (cells-at-positions 35.561561584472656 11697)
 (fetch-32-32-results 1.171112060546875 94)
 (extract-column .2582073211669922 4)
 (get-join-columns .05888938903808594 5)
 (text-search-to-structured-query .032901763916015625 1)
 (columns .0324249267578125 10)
 (column-position .031232833862304688 8)
 (rows .011920928955078125 4))

Hmmm.. I really need to get something to html pretty print the scheme code.

A simple scheme unittest DSL

One of the first things I wrote when I was in the 'nesting'* phase of learning gambit scheme was a unittest DSL. Part of this was that I wanted an excuse to use r5rs syntax-rules macros, but the real motivation was that I'd been seduced by the idea of using tests for documentation ala Nat Pryce's 'Protest'. Here's what I came up with: (example):


(define-tests bus-tests

  (drive-to-next-stop        ; name of fn/class/symbol being tested
    ("takes bus to next stop"
      (drive-to-next-stop bus)
      (assert-equal 'next-stop (bus 'position)))
    ("doesn't stop off at chipshop on the way"
      ; test code to detect chipshop hasn't been stopped at
      ))

  (pick-up-passengers
    ("picks up passengers from the bus stop"
      test code )
    ("doesn't leave passengers at the stop"
      test code )
    ("waits for old lady running to the stop"
      test code )))

The point is that it's easy to write some lisp to traverse this code and generate documentation from it.

I found when using protest in python that the documentation angle reinforced some healthy habits: When you write tests you naturally think 'how would this look to another person?' 'how can I document the behaviour of this?' which encourages more complete testing. Also when you look at generated documentation it's easy to see which bits you aren't testing because the documentation is missing (which then encourages you to write more tests).

The implementation is a bit clunky and makes use of gambit exceptions as a way of terminating tests early because of assert failures (which is a bit rubbish). What probably should be happening is that the outer macro should be re-writing each assert as an 'if' or something to conditionally execute the rest of the test. (which would portable to other scheme implementations) To be honest I knocked this up as fast as I could so that I could move onto writing other things (I'm developing a data aggregation and indexing tool), but the point of this post is more to convey the idea than the implementation.

That said, the crufty (currently gambit specific) implementation is here - hope this is interesting/useful to somebody.

* nesting as in 'building a nest'

Gambit-C namespaces

One of the first issues I had when evaluating scheme as a possible replacement for Python as my hobby language was its apparent lack of module/library/namespace system. How do people possibly build big programs? I wondered.

Now it turns out most (all?) scheme implementations have features to deal with modules and libraries. Gambit's is particularly nebulous in that it doesn't appear to be documented anywhere. Anyway, here's how it appears to work. I'm sure somebody will correct me if I've got something wrong:

Gambit has the 'namespace' primitive, with which you can declare that certain definitions belong in certain namespaces. Here's an example:


> (namespace ("f#" foo) ("b#" bar baz))

This means (AFAICS): "any further use/definition of the term 'foo' will reference the f# namespace and any use of bah/baz will reference the b# namespace".

e.g.


> (namespace ("f#" foo) ("b#" bar baz))

> (define (foo) "I am foo")  ; defines f#foo

> (foo)
"I am foo"

> (f#foo)
"I am foo"

> (define (bar) "I am bar")

> (b#bar)
"I am bar"

This is cool because it allows you to retroactively fit namespaces to scheme code loaded from other files. E.g. If mod1.scm and mod2.scm both defined a procedure 'foo', you could use namespaces to allow both to be used in the same environment thus:


> (namespace ("m1#" foo))
> (load "mod1")    ; contains: (define (foo) "I am mod1's foo")

> (namespace ("m2#" foo))
> (load "mod2")    ; contains: (define (foo) "I am mod2's foo")

> (m1#foo)
"I am mod1's foo"

> (m2#foo)
"I am mod2's foo"

Job done. Now I haven't really used gambit namespaces much, so I not in a position to provide a good comparison with other approaches, however the feature does seem in keeping with the rest of the scheme language. By that I mean rather than a large set of fully blown rich language features you get a small bunch of simple but very extensible primitives with which to build your own language.

An good example of building a big system over these small primitives is Christian Jaeger's chjmodule library where he has used namespaces along with 'load' and 'compile-file' (and of course macros) to build a more industrial strength module system. This includes an 'import' keyword that loads from a search path and a procedure to recursively compile and import modules. Some example code from the README:



$ gsc -i -e '(load "chjmodule")' -

> (import 'srfi-1)
> fold
#
> (fold + 0 '(1 2 3))
6
> (build-recursively/import 'regex-case)
            ; recompiles regex.scm (a dependency) if necessary,
            ; then (re)compiles regex-case.scm if necessary and imports it.
> (regex-case "http://www.xxx.yy" ("http://(.+)" (_ url) url) (else url))
"www.xxx.yy"
> *module-search-path* 
("."
 "mod"
 "gambit"
 "~/gambit"
 "~~"
 "..")

Sweet. I'm guessing it'll also be possible to build the r6rs library syntax in gambit scheme the same way.

Some hardcore Gambit-C features

Somebody asked me about gambit-c the other day, and why I was using that as opposed to some other language or runtime for my own-time coding stuff. Despite the scheme language being all cool, the thing that really made his eyes light up was the C features in gambit (it is called gambit-c for a reason). Here's some cool stuff you can do with gambit:

  1. Scheme compiles to native machine code
  2. The gambit scheme compiler compiles to C, which gcc then compiles to shared libraries or executables. The gsc command wraps this whole process, so you do a 'gsc < myschemefile>' which drops a shared library out of the other end. The (load) procedure in gambit will import either interpreted scheme code or compiled object files into the process, so you're good to go. In addition, the gsc compiler can also be run as an interactive interpreter (gsc -i) which acts just like the normal gambit scheme repl interpreter except you also have access to the compiler from your code. E.g. As well as dropping interpreted scheme code into the repl I can also compile a file and load it into the running repl process without dropping to the command line - cool!.
  3. You can embed C code directly into gambit scheme files.
  4. (c-include) lets you paste C code into your scheme, and (c-lambda) lets you define lambdas in C. This is really sweet. I thought the python C api was good, but because you have to write your c stuff in seperate files it always requires some sort of make/build system, and that's always been just too much of a barrier for me to use it day-to-day. With gambit you can just switch in a few lines of C into your performance hotspot and you're good to go. This also gives you trivial access to C libraries and low-level stuff - e.g. I use it for mmapped files. Having the C in the same file as scheme means the GCC compiler can optimize and inline C code into compiled scheme and vice-versa. The other advantage of this approach is that the gambit-c environment pretty much requires you to have a C compiler in the mix, so as a developer I can rely on it being there when distributing source to other developers, pasting code into emails etc..
  5. You can compile and load C into a running scheme process
  6. Actually this is just a mix of (1) and (2), but it's really cool when you think about what's going on. Make an update to your C code and dynamically re-load it into your running process. I have an emacs keybinding which executes a 'cload' function in the repl:
    
    (define (cload f)
      (compile-file f)
      (load f))
    
    
    I.e. edit the C code, whack the button and it's in the repl process. This keeps the dev loop really tight even when writing C.
  7. You can compile the whole thing into a native binary.
  8. This is especially cool and important when you consider that gambit-c isn't currently a popular runtime. It means you can distribute native binaries of your app for windows and mac users so that they can try your app without worrying about dependencies.

(*) N.B. although uncommon outside the lisp world, these compilation features are actually pretty common in lisp/scheme implementations. E.g. I think chicken, bigloo and SBCL provide simililar things.

Scheme development environment

I recently had my work laptop nicked while I was in paris, so I've had to reconstruct my linux development environment on another laptop. That reminded me that I intended to document this stuff since I had to dig around a bit for it when I first picked up scheme a few months ago.

Things I use:

  • Gambit scheme
  • The main reason I've stuck with this is termite, but I've also found the mailing list friendly and helpful (and full of much bigger brains than mine). Gambit can also compile static C binaries that run on windows - that's important if you're going to write code in an esoteric environment like scheme. It also has a decent FFI which allows you to embed C code in with the scheme - tres cool, especially when you're writing performance critical code.
  • Emacs
  • You have to use this if you're doing lisp development. Personally I use emacs for development in any language, including Java. Steve Yegge wrote: "Emacs is to Eclipse as a light saber is to a blaster - but a blaster is a lot easier for anyone to pick up and use.". Nuff said. I tend to have two frames open - one with the gambit repl in it and the other to do the actually coding. For people that aren't in the know, the lisp development experience is slightly different to most languages: you basically have a process running (called the REPL) and inject blocks of code into it. This makes the development cycle turnaround super-fast and it becomes a bit frustrating when you go back to waiting for compile cycles in other languages. The various emacs scheme modes provide key presses for sending various regions of code to the repl, the most useful being the current definition (i.e. function) and the last expression.
  • Quack.el
  • Quack has lots of features that make editing scheme much easier. My favourite is automatically converting the word 'lambda' into a single greek lambda character (a-la DR scheme). In addition to making the code smaller, having greek letters all over the place makes me feel pretty hard-core (which is obviously very important).
  • E-tags
  • Emacs tags isn't a patch on what bicyclerepairman provides when I'm writing python, but it does enough to make navigating code relatively hassle free, plus it works with every language you can think of. Basically it parses files and creates an index of all the definitions so that you can jump to the definition of a symbol and back in a single key chord.
  • GNU Info
  • It's handy to have all the documentation in info format because then you can use emacs to jump to the apropriate docs when you need them without having to switch to browser etc.. E.g. this function maps [f1] to jump to the r5rs docs for whatever the cursor is currently pointing at. (with thanks to Bill Clementson for this).
    (add-hook 'scheme-mode-hook
          (lambda ()
            (define-key scheme-mode-map [f1]
              '(lambda ()
             (interactive)
             (ignore-errors
               (let ((symbol (thing-at-point 'symbol)))
                 (info "(r5rs)")
                 (Info-index symbol)))))))
    
    (N.B. quack has a feature to auto-open web based docs into emacs while you're coding, but I work offline so much that I don't use this much)
  • A unit testing framework.
  • I didn't really get on with any of the ones I tried, so I wrote a simple DSL myself (took about half a day after I'd figured out how syntax-case macros work). Ideally I want to end up using something like Nat's Protest system, which chucks out documentation as a side effect of testing. A Scheme DSL should be a good fit for this style, since you can name tests using strings rather than having to document with function names. For the time being though it provides just enough to get me testing (and also served to teach me a few things about macros, which is good.)

Is there anything missing?

Refactoring and the Repl

I'm still perservering with Gambit scheme, and progressing pretty slowly it has to be said. The first thing I've been missing is the lack of refactoring tools for scheme.

I wrote the basic python refactoring functionality in bicyclerepairman a long while ago, and having it as part of my daily toolset has strongly influenced the way I program. For example, I tend to follow the 'bash out some code and then clean it up' style of development. In particular, I have a habit of naming variables and functions badly and then renaming them later as I code.

So my initial thought is: no problem - I'll just knock up a bicyclerepairman for scheme! The problem is that I'm not quite sure how to do automated refactoring with a repl. You see Python has no real repl culture (sure it has a repl, but nobody uses it except for trying out simple expressions). People tend to run their program/unittests from scratch each iteration, which means the entire environment gets re-evaluated on each run.

The challenge with running a repl while you develop is keeping it in sync with your refactored code: E.g. if I rename a function that's used in multiple places, that results in lots of code that needs re-evaluating. Can this be done automatically (e.g. could it be made to work by just re-eval'ing files?). Hmm.. I think I need to talk to somebody with a lot more scheme experience than I have. Unfortunately I don't actually know any experienced schemers, especially not in London or Birmingham; maybe somebody from lshift can help?

Scheme is love

I've been battling again with Scheme recently. Having spent the last couple of months playing with various languages, I've come to the conclusion that scheme is the only one that has any real possibility of becoming my next 'general purpose language'. Python held that crown for many years, but its lack of blocks and concurrency caused me to start looking elsewhere and now I'm spoilt.

So, to Scheme. I've not found another language that can offer:

  • functional programming
  • message-passing concurrency (see termite)
  • macros
  • continuations
  • terse syntax
  • hardly any language cludges

...and as somebody who programs for fun in his spare time, these things really do matter to me. The biggest obstacle to full enlightment is the s-expression aesthetic: To my algol-shaped brain that lisp syntax just looks so damn ugly!

Anyway, I'm finding that the most enjoyable and self-affirming way to develop some scheme skills is (ironically) to re-read Peter Seibel's 'Practical Common Lisp' book with scheme glasses on. Now if there's anyone going to convince me that lisp syntax isn't just a grotty heap of parentheses, it's going to be Peter. His book just radiates lisp-love, and you can't help but be hooked. It says 'Look! You fools! Just look what you're missing!'. I've been translating various examples into scheme, just to test the water.