Tagtriples + identity precision

Ian Davies has been discussing the complexity of RDF and considering the possibility of an RDF Lite. Danny Ayers also picked it up here.

Readers of this blog will already know that I struggled with teaching RDF's complexity when attempting to promote and deploy it at work. This prompted my research and creation of a simpler model at home in my spare time - tagtriples.

Tagtriples is an attempt to create a structured metadata format and model with similar properties to RDF (e.g. encoding graphs, trivially mergeable/aggregatable), but made much simpler by allowing any symbol to be an identifier - not just URIs.

This 'freedom' of identifier symbols begs the question - without the precise framework of URIs or namespaces to handle identity*, how do you describe or refer to something with any degree of precision?

Well, tagtriples enables precise identification by inforcing a simple rule: If you use a symbol in a 'graph' (e.g. in a document), all other occurances of the symbol in the graph are assumed to be denoting the same thing. This allows you to describe things, and thus facilitates 'identity by description', which I think is the key to solving the identity problem on the web in a scalable, multi-context-compatible manner.

So to give the Dublin Core example, a document creator could be described in tagtriples using just his/her name:

http://www.w3.org/Home/Lassila creator "Ora Lassila"

But you could also elaborate on what you mean by "Ora Lassila" by adding more statements:

http://www.w3.org/Home/Lassila creator "Ora Lassila" 
"Ora Lassila" tag Person
"Ora Lassila" name "Ora Lassila"[2]
"Ora Lassila" mbox ora@example.com

(I've suffixed the name 'Ora Lassila' with [2], because we've already used the symbol once in the graph to denote the person 'Ora Lassila', and now we're using it to denote the name. This is how tagtriples allows multiple things with the same symbol).

Now JAM*VAT (opensource tagtriples aggregator) can extract tagtriples from any XML using a bunch of simple heuristics. The above graphs could be encoded as:

<document>
  <url>http://www.w3.org/Home/Lassila</url>
  <creator>Ora Lassila</creator>
</document>

..which yields the statements in the first document (and a few others), or

<document>
  <url>http://www.w3.org/Home/Lassila</url>
  <creator>
     <Person>
         <name>Ora Lassila</name>
         <mbox>ora@example.com</mbox>
     </Person>
  </creator>
</document>

..which yields the second.

This is very powerful because both XML documents will yield the same result for the following tagtriples query:

select ?person
where (http://www.w3.org/Home/Lassila creator ?person)

thus neatly solving the problem Ian Davies is having with RDF.

Actually I've found that forcing a high level of precision onto metadata producers has its own problems - identity is tightly bound up with context, and as the context varies, so does the identity. But that's another post entirely.