A commonly held principle of identity in within RDF circles is that the owner of the URI gets to pick what it identifies. That sounds perfectly reasonable in theory, but unfortunately my experience with using RDF at work suggests that in practice the meaning of a URI tends to skew a bit with usage and context.

For example, at work we exported the ldap directory of employees as rdf, generating a series of URIs in the process. These URIs were then used in other graphs to connect other data to employees. In this data, sometimes the URI is used to identify the user-entry in the directory, and sometimes to denote the person itself. (e.g. probably app:BondTrader isn't administered by the person:Frank ldap directory entry, it's administered by the person it denotes).

This is of course a matter of precision, and in theory we could have one set of URIs to denote the people, and another set to denote the directory entries. But the problem is that, to a greater or lesser extent, this seems to happen all the time.

E.g. We've had similar happenings with dns-name vs server - i.e. sometimes the concept of dnsname blurs with the server it points to (e.g. if you're in an app team), and sometimes it doesnt (if you look after dns). The properties of dnsname make sense in either context - e.g. you can think of an alias in terms of a server quite easily.

Off the top of my head, others include:

  • application vs monitoring-configuration-entry
  • windowslogin vs person
  • application (software) vs project

So what's the solution? Should we be murderously precise about what we mint, or do these vague overlapping concepts have their place?

This is especially interesting to me because in tagtriples the concept of identity is a little blurry anyway. Identity tends to get built up by description rather than by minting an 'id', and so skews according to the context it's being used in.

This blurryness has lead to me consider putting the final responsibility of identity on the shoulders of the 'aggregator' (i.e. the person doing the aggregating) rather than the author of the data. It's a compelling solution since the person doing the aggregating is collecting and merging data for a particular purpose under a particular context.

E.g my tagtriples aggregator at work is used to collect data for the purpose of managing applications in DRKW, and thus all the data is specific to DRKW. If, say, DRKW was ever merged with another investment bank I'd need to reconcile this data, and so would probably add 'drkw' tags to all of the existing application data in order to manage collision with the new data.

Using this mechanism I could also get to choose (to some degree) whether the dnsname 'ab35622abc' is different to the server 'ab35622abc' for the context I'm aggregating in.