A poor man’s scheme profiler

Gambit scheme lacks a profiler that can profile scheme with embedded C code. (There's statprof, but unfortunately it doesn't profile embeded C). I needed to do this pretty desperately for my triple indexing stuff so I've written a simple macro which takes wall clock timings of functions and accumulates them.

You replace 'define' with 'define-timed' in the functions you want profiled, and then the time spent in each function is accumulated in a global hash table. It's not pretty, but it's simple.

The macro needs some supporting code (which I keep in a seperate module so that it can be compiled in order to minimise overhead). :


(define *timerhash* (make-table))

;;; call this before running the code
(define (reset-timer) (set! *timerhash* (make-table)))


;;; adds the time spent in the thunk to an entry in the hashtable
(define (accum-time name thunk)
  (let* ((timebefore (real-time))
         (res (thunk)))
    (table-set! *timerhash* name 
                (let ((current (table-ref *timerhash* name '(0 0))))
                  (list (+ (car current) (- (real-time) timebefore))
                        (+ (cadr current) 1))))
    res))

;;; call this afterwards to get the times
(define (get-times)
  (map (lambda (e) (list (first e) 
                    (* 1000 (second e))
                    (third e)))

       (sort-list (table->list *timerhash*)
                  (lambda (a b) (> (second a)
                              (second b))))))

And here's the macro. It basically just wraps the function body in a call to 'accum-time':


(define-syntax define-timed
  (syntax-rules ()
    ((define-timed (name . args) body ...)
     (define (name . args)
       (accum-time 'name (lambda ()
                           body ...))))))

Here's some example output. The first number is the accumulated milliseconds spent in the function, and the second is the number of times it was called.


((handle-query-clause 2182.3294162750244 8)
 (do-substring-search-query 1678.9898872375488 1)
 (run-query 1678.9379119873047 1)
 (main 1678.929090499878 1)
 (handle-substr-search 705.5349349975586 2)
 (convert-ids-to-symbols 400.3608226776123 1)
 (handle-s_o-clause 192.81506538391113 2)
 (lookup-o->s 192.52514839172363 2)
 (lookup-2-lvl-index 192.48294830322266 2)
 (join 163.8491153717041 8)
 (hash-join 163.6500358581543 5)
 (fetch-int32-results 118.37220191955566 655)
 (project 100.7072925567627 3)
 (concat 77.42834091186523 11793)
 (handle-p->so-clause 55.348873138427734 2)
 (bounds-matching-1intersect 51.54275894165039 2)
 (lookup-p->2-lvl-index 50.98700523376465 2)
 (search-32-32-index 42.65856742858887 94)
 (cells-at-positions 35.561561584472656 11697)
 (fetch-32-32-results 1.171112060546875 94)
 (extract-column .2582073211669922 4)
 (get-join-columns .05888938903808594 5)
 (text-search-to-structured-query .032901763916015625 1)
 (columns .0324249267578125 10)
 (column-position .031232833862304688 8)
 (rows .011920928955078125 4))

Hmmm.. I really need to get something to html pretty print the scheme code.

A simple scheme unittest DSL

One of the first things I wrote when I was in the 'nesting'* phase of learning gambit scheme was a unittest DSL. Part of this was that I wanted an excuse to use r5rs syntax-rules macros, but the real motivation was that I'd been seduced by the idea of using tests for documentation ala Nat Pryce's 'Protest'. Here's what I came up with: (example):


(define-tests bus-tests

  (drive-to-next-stop        ; name of fn/class/symbol being tested
    ("takes bus to next stop"
      (drive-to-next-stop bus)
      (assert-equal 'next-stop (bus 'position)))
    ("doesn't stop off at chipshop on the way"
      ; test code to detect chipshop hasn't been stopped at
      ))

  (pick-up-passengers
    ("picks up passengers from the bus stop"
      test code )
    ("doesn't leave passengers at the stop"
      test code )
    ("waits for old lady running to the stop"
      test code )))

The point is that it's easy to write some lisp to traverse this code and generate documentation from it.

I found when using protest in python that the documentation angle reinforced some healthy habits: When you write tests you naturally think 'how would this look to another person?' 'how can I document the behaviour of this?' which encourages more complete testing. Also when you look at generated documentation it's easy to see which bits you aren't testing because the documentation is missing (which then encourages you to write more tests).

The implementation is a bit clunky and makes use of gambit exceptions as a way of terminating tests early because of assert failures (which is a bit rubbish). What probably should be happening is that the outer macro should be re-writing each assert as an 'if' or something to conditionally execute the rest of the test. (which would portable to other scheme implementations) To be honest I knocked this up as fast as I could so that I could move onto writing other things (I'm developing a data aggregation and indexing tool), but the point of this post is more to convey the idea than the implementation.

That said, the crufty (currently gambit specific) implementation is here - hope this is interesting/useful to somebody.

* nesting as in 'building a nest'

Gambit-C namespaces

One of the first issues I had when evaluating scheme as a possible replacement for Python as my hobby language was its apparent lack of module/library/namespace system. How do people possibly build big programs? I wondered.

Now it turns out most (all?) scheme implementations have features to deal with modules and libraries. Gambit's is particularly nebulous in that it doesn't appear to be documented anywhere. Anyway, here's how it appears to work. I'm sure somebody will correct me if I've got something wrong:

Gambit has the 'namespace' primitive, with which you can declare that certain definitions belong in certain namespaces. Here's an example:


> (namespace ("f#" foo) ("b#" bar baz))

This means (AFAICS): "any further use/definition of the term 'foo' will reference the f# namespace and any use of bah/baz will reference the b# namespace".

e.g.


> (namespace ("f#" foo) ("b#" bar baz))

> (define (foo) "I am foo")  ; defines f#foo

> (foo)
"I am foo"

> (f#foo)
"I am foo"

> (define (bar) "I am bar")

> (b#bar)
"I am bar"

This is cool because it allows you to retroactively fit namespaces to scheme code loaded from other files. E.g. If mod1.scm and mod2.scm both defined a procedure 'foo', you could use namespaces to allow both to be used in the same environment thus:


> (namespace ("m1#" foo))
> (load "mod1")    ; contains: (define (foo) "I am mod1's foo")

> (namespace ("m2#" foo))
> (load "mod2")    ; contains: (define (foo) "I am mod2's foo")

> (m1#foo)
"I am mod1's foo"

> (m2#foo)
"I am mod2's foo"

Job done. Now I haven't really used gambit namespaces much, so I not in a position to provide a good comparison with other approaches, however the feature does seem in keeping with the rest of the scheme language. By that I mean rather than a large set of fully blown rich language features you get a small bunch of simple but very extensible primitives with which to build your own language.

An good example of building a big system over these small primitives is Christian Jaeger's chjmodule library where he has used namespaces along with 'load' and 'compile-file' (and of course macros) to build a more industrial strength module system. This includes an 'import' keyword that loads from a search path and a procedure to recursively compile and import modules. Some example code from the README:



$ gsc -i -e '(load "chjmodule")' -

> (import 'srfi-1)
> fold
#
> (fold + 0 '(1 2 3))
6
> (build-recursively/import 'regex-case)
            ; recompiles regex.scm (a dependency) if necessary,
            ; then (re)compiles regex-case.scm if necessary and imports it.
> (regex-case "http://www.xxx.yy" ("http://(.+)" (_ url) url) (else url))
"www.xxx.yy"
> *module-search-path* 
("."
 "mod"
 "gambit"
 "~/gambit"
 "~~"
 "..")

Sweet. I'm guessing it'll also be possible to build the r6rs library syntax in gambit scheme the same way.

Some hardcore Gambit-C features

Somebody asked me about gambit-c the other day, and why I was using that as opposed to some other language or runtime for my own-time coding stuff. Despite the scheme language being all cool, the thing that really made his eyes light up was the C features in gambit (it is called gambit-c for a reason). Here's some cool stuff you can do with gambit:

  1. Scheme compiles to native machine code
  2. The gambit scheme compiler compiles to C, which gcc then compiles to shared libraries or executables. The gsc command wraps this whole process, so you do a 'gsc < myschemefile>' which drops a shared library out of the other end. The (load) procedure in gambit will import either interpreted scheme code or compiled object files into the process, so you're good to go. In addition, the gsc compiler can also be run as an interactive interpreter (gsc -i) which acts just like the normal gambit scheme repl interpreter except you also have access to the compiler from your code. E.g. As well as dropping interpreted scheme code into the repl I can also compile a file and load it into the running repl process without dropping to the command line - cool!.
  3. You can embed C code directly into gambit scheme files.
  4. (c-include) lets you paste C code into your scheme, and (c-lambda) lets you define lambdas in C. This is really sweet. I thought the python C api was good, but because you have to write your c stuff in seperate files it always requires some sort of make/build system, and that's always been just too much of a barrier for me to use it day-to-day. With gambit you can just switch in a few lines of C into your performance hotspot and you're good to go. This also gives you trivial access to C libraries and low-level stuff - e.g. I use it for mmapped files. Having the C in the same file as scheme means the GCC compiler can optimize and inline C code into compiled scheme and vice-versa. The other advantage of this approach is that the gambit-c environment pretty much requires you to have a C compiler in the mix, so as a developer I can rely on it being there when distributing source to other developers, pasting code into emails etc..
  5. You can compile and load C into a running scheme process
  6. Actually this is just a mix of (1) and (2), but it's really cool when you think about what's going on. Make an update to your C code and dynamically re-load it into your running process. I have an emacs keybinding which executes a 'cload' function in the repl:
    
    (define (cload f)
      (compile-file f)
      (load f))
    
    
    I.e. edit the C code, whack the button and it's in the repl process. This keeps the dev loop really tight even when writing C.
  7. You can compile the whole thing into a native binary.
  8. This is especially cool and important when you consider that gambit-c isn't currently a popular runtime. It means you can distribute native binaries of your app for windows and mac users so that they can try your app without worrying about dependencies.

(*) N.B. although uncommon outside the lisp world, these compilation features are actually pretty common in lisp/scheme implementations. E.g. I think chicken, bigloo and SBCL provide simililar things.