Danny! Here's another <a href="alternative for you to consider:- tagtriples. (I must be sounding like a stuck record blathering on about tagtriples all the time, but hear me out...)

I started tagtriples as an attempt to find the simplest subset of RDF that wouldn't lose any of the merging pixie-dust features. (RDF was proving just too complicated to get critical mass at my work, and required up-front agreement to get data to merge). Fresh from the folksonomy buzz I tried replacing URIs with sets of tags used in combination, then realised that tags could be modelled as statements with predicate 'tag' and ended up with tagtriples.

So instead of a universal ID to define the thing, you rely on combinations of symbols and statements. This opens up more possibilities for magical pixie-dust merging as this emphasis leverages existing symbols grounded in real life (email addresses, phone numbers, names etc..) in combination to join data. (btw, note that even FOAF RDF drops URIs at the point where it needs to do the pixie-dust merging stuff).

But the really cool thing is: when you loose the URIs, scraping data from other general formats becomes simpler and, to a certain degree, actually automatable. You just recreate the structure of the data in triples, and then use the symbols from the source data as node identifiers. You don't even need to be precise to get a representation that can yield useful results (especially if you're doing sparql-style queries).

BTW, I've written importers for XML, RDF, CSV, and recently a microformats hCard and hCalendar. At work we have a turn-key database and ldap exporter. I'm now wondering whether it's possible to mine some quality of semantic statements out of general 'semantically-oriented' XHTML. Maybe at least enough to do some structured querying on. (really must post and ask the microformats list about this)

Other benefits:

  • clusters of symbols are amenable to proximity searching techniques - i.e. find a cluster of statements containing these symbols. This is a powerful way of finding microcontent structures in the data mush.

  • Sparql style structured querying becomes simpler - no more namespaces to remember! Combinations of statement patterns in the query easily restrict the set of possible matches to the point where you don't need namespaces to ensure precision.

Ok, so here's the final pitch: With all the scraping, searching and browsing stuff you can do, neither the author or the consumer needs to actually know anything about tagtriples.

I think that's really cool: The user can import data from existing formats and models, search for items across the merged data (using google-style text searches that work over symbols in close proximity), and browse the data structures without caring that there's some triples-tags-and-graphs model behind it.

That's the most powerful bit. This stuff is useful without gaining any sort of adoption critical mass. (E.g at work I installed JAM*VAT, emptied some databases and ldap stores into it and suddenly people can search and browse across the merged indexed data, traversing where symbol and statement combinations mesh).

So waddaya think? Existing formats + a generic model to aggregate and interpret the semantic data: An RDF alternative contender?