My recent look at microformats has lead me to think more about the levels of grey between being able to fully interpret (understand) data, and not being able to interpret it at all. Microformats are currently very binary in this regard - either the software knows the microformat and is able to interpret it, or it doesn't. This is at odds to other data formats, including XML and RDF, which can convey structure even if the software doesn't fully understand the schema and vocabulary in use.
Some (local, approx) definitions:
- A coarse-grained 'chunk' of data. E.g. a document on the web.
- The scoping of data, parent/child relationships, links, which bits are properties and which bits are values
- A set of restrictions which explicitly constrain the values and structure that can be used in the data (without requiring understanding of the actual meaning). E.g. XMLSchema, RelaxNG, WSDL
- The 'meaning' of properties. e.g. 'what does "name" mean?'. Usually articulated relative to other properties. (e.g. OWL)
Here's some 'levels' of data understanding, and some things software can usefully do at each level:
1) The software is unable to interpret the meaning of the data, and also unable to interpret the structure
The data can still be broken into a sea of atomic bits, and those bits indexed to enable searching. E.g. A straight text index on an html document doesn't attempt to interpret the structure of the data - it merely indexes the occurances of the words in the sea of text. Consumers that understand both structure and meaning can then retrieve the graphs (documents) that contain certain words.
2) The software is unable to interpret the meaning of the data, but can interpret its structure
The data structure can be held and indexed (even though the meaning isn't understood at any level). It can be aggregated and presented ready-indexed (e.g. via a structured query interface) to some agent or program that does understand more of the meaning. The software can break the data into logical chunks that are more granular than the graphs input into it. The software can perform transformations on the data, present it marked up in a different way, and perform statistical analysis to look for trends and similarities in structure/vocabulary with other graphs.
3) The software is able to interpret some of the meaning (knows some of the vocabulary used), but not all of it.
It can perform structured queries, operations and transformations based on the bits of vocabulary it does understand. It can present this 'understood' data, along with the structured data it doesn't understand to an agent/program/human that may be able to interpret more of the vocabulary.
4) The software is able to interpret the meaning of the data.
Then at this point it's probably human.