Had an implementation thought on the train this morning:

Instead of implementing a graph as a single id in the 4th column of the quads table, how about implementing it as a range of triple ids. i.e. Each triple is given a sequential id in the 4th column when asserted, and the graphs are stored as (startid, endid).

Advantages:

  • subgraphs can also be captured (useful for 'quoting')
  • the order of the graph is preserved (useful for retaining graph order for e.g. signing, and handling ordered collections without requiring an ordered collection type)
  • the number of statements in a graph is implicitly stored (endid - startid)

Disadvantages:

  • Cant use a UNIQUE(s,p,o,g) index to assert that the same statement doesnt occur twice in a graph. (Is this a problem? not really - can put up with duplicates)
  • Graphs can only be asserted one at a time. (or could lease a block of ids or segregate the id space)
  • Store size limited by length of id field. (actually I suspect other factors will limit scalability before this, and it could be made 64 bit)
  • Queries involving SOURCE become more tricky to implement internally. (WHERE triple.graph < g.start AND triple.graph >g.end)</>
  • Can't add triples to an existing graph (well, not without re-allocating ids or maybe using a cluster field in the graph table or something)

Hmmm... Subgraphs and preserved ordering are compelling advantages - this is sounding like a good tradeoff to me...