Implementing graphs as triple ranges
Had an implementation thought on the train this morning:
Instead of implementing a graph as a single id in the 4th column of the quads table, how about implementing it as a range of triple ids. i.e. Each triple is given a sequential id in the 4th column when asserted, and the graphs are stored as (startid, endid).
Advantages:
- subgraphs can also be captured (useful for 'quoting')
- the order of the graph is preserved (useful for retaining graph order for e.g. signing, and handling ordered collections without requiring an ordered collection type)
- the number of statements in a graph is implicitly stored (endid - startid)
Disadvantages:
- Cant use a UNIQUE(s,p,o,g) index to assert that the same statement doesnt occur twice in a graph. (Is this a problem? not really - can put up with duplicates)
- Graphs can only be asserted one at a time. (or could lease a block of ids or segregate the id space)
- Store size limited by length of id field. (actually I suspect other factors will limit scalability before this, and it could be made 64 bit)
- Queries involving SOURCE become more tricky to implement internally. (WHERE triple.graph < g.start AND triple.graph >g.end)</>
- Can't add triples to an existing graph (well, not without re-allocating ids or maybe using a cluster field in the graph table or something)
Hmmm... Subgraphs and preserved ordering are compelling advantages - this is sounding like a good tradeoff to me...