importing statements at speed
Spent the evening working on bulk assert speed for my ifp smushing store. Managed to get a 120000 statement import down from 410 seconds to 109 seconds - that's over 1000 triples per second. Pretty good considering the rdflib rdf/xml parsing takes 59 seconds on its own - I wonder how fast this would be with raptor...
The trick was to lock the database, generate all data into files (allocating ids in memory), then bulk import using 'LOAD DATA INFILE'.
Actually this has lead me to wonder whether there's an rdf store profiling toolkit I should be using to test this - the performance comparison documents I've read use proprietary data, which means I can't use them to compare my store. There was a post a while back with some example (copyright-free) large data files, but I can't find it now. If anybody remembers and knows where they are, please leave a comment.