Intuitive overview of principal components analysis (PCA)

I found an excellent and short introductory tutorial pdf on principal components analysis (PCA). It provides a good overview of the following concepts in a particularly intuitive manner:

  1. Mean Average
  2. Standard Deviation
  3. Variance
  4. Co-Variance
  5. Matrix transformations
  6. Eigenvectors & EigenValues
  7. Principal Component Analysis

Unfortunately I found the eigenvectors bit a bit heavy going. Luckily the wikipedia page for eigenvectors has a fantastic illustration on the right that gave me an instant feel of what was happening.

First dynamically balancing walking robot

I find this sort of thing really exciting: Trevor Blackwell's 'Dexter' robot finally walks!. There's a video and everything!

From Paul Graham's post:

There are of course [other] biped robots that walk. The Honda Asimo is the best known. But the Asimo doesn't balance dynamically. Its walk is preprogrammed; if you had it walk twice across the same space, it would put its feet down in exactly the same place the second time. And of course the floor has to be hard and flat. Dynamically balancing—the way we walk—is much harder. It looks fairly smooth when we do it, but it's really a controlled fall. At any given moment you have to think (or at least, your body does) about which direction you're falling, and put your foot down in exactly the right place to push you in the direction you want to go.

Dark side of the semantic web

I haven't said anything much about semantic web stuff for a while as I've been occupied with other things. However Jim Hendler's 'Tales from the Dark Side' piece in IEEE Intelligent Systems reawoke an old interest. In short: I still think the RDF people have got it wrong with URIs, and so far nobody's convinced me otherwise.

My (same old) argument: URIs are bad for large-scale interoperability. The alternative: just use words and symbols occuring in real life, and use the context inherent in the communication to disambiguate meaning.

The interesting thing about the Hendler piece is that the it pretty much walks through the arguments I make for dropping URIs, but then avoids the conclusion:

If you and I decide that we will use the term "http://www.cs.rpi.edu/~hendler/elephant" to designate some particular entity, then it really doesn't matter what the other blind men think it is, they won't be confused when they use the natural language term "Elephant" which is not even close, lexigraphically, to the longer term you and I are using. And if they choose to use their own URI, "http://www.other.blind.guys.org/elephant" it won't get confused with ours. The trick comes, of course, as we try to make these things more interoperable. It would be nice if someone from outside could figure out, and even better assert in some machine-readable way, that these two URIs were really designating the same thing - or different things, or different parts of the same thing, or ... ooops, notice how quickly we're on that slippery slope. "

And this neatly sums up the situation with URIs. The low chance of collision represents a tradeoff: You get a high level of semantic precision - it's extremely unlikely that two parties will use the same URI to mean two totally unconnected things. You also get a very low level of semantic interoperability: it's equally unlikely that two unconnected parties will use the same URI to denote (even parts of!) the same thing.

Now I think the precision part is overrated - disambiguation of natural language terms can be tractably (and often trivially) achieved using contextual cues. However interoperablity of data from unconnected sources is really hard, and that's why I think this is a bad tradeoff.

Anyway, the crux of the Hendler piece is that for all the high level work going on in Semantic Web land (ontology languages, description logic), it's currently simple interoperability mechanisms that gain most traction and add the most value: 'a little semantics goes a long way'.

The piece implies (afaics) that this is where effort should be directed, and cites the example of matching FOAF data using email addresses as illustration of the potentual success of this approach. The matching heuristic is: if two FOAF resources are describing people with the same email address, they're very likely to be about the same person.

My experience concurs with the 'a little semantics goes a long way' sentiment, but personally I think FOAF has succeeded (for some measure of success) not because of RDF but in spite of it. I'd argue that the only reason the email matching works on a large scale is because email addresses are already concrete symbols grounded in the real world. FOAF didn't create them, it just provided a context for their use. FOAF's formal semantics certainly didn't create this interoperability - the largest example of foaf data is scraped from live journal's databases where the users creating the data have little concept of the ramifications of the 'http://xmlns.com/foaf/0.1/mbox' property.

If FOAF had to rely on artificial URIs as the sole means for identifying people it would struggle to gain any traction in the messy real world of the web.

However on the flip side I think FOAF would work just as well (and gain a lot more traction) if its underlying model didn't employ URIs at all and instead just used triples of words/symbols. Semantic web software would still be able to identify and index FOAF data: i.e. the symbol 'FOAF' is pretty unambiguous on its own, but even if it wasn't the juxtaposition of the symbol FOAF with properties like 'mbox', 'surname' etc.. would suffice for pretty accurate disambiguation.

Hierarchical Temporal Memory (HTM) Resources

I've been thinking about AI again recently.
This time I got motivated by watching a couple of videos of Jeff Hawkins talking about his HTM (Hierarchical Temporal memory) ideas. Actually this has been a common theme for me recently - motivated speakers on podcasts and videos are much more likely to get me interested in something than the written word.

Anyway, a look through the OnIntelligence forum brought up a couple of papers I hadn't seen before: Saulius Garalevicius has been experimenting with Jeff and Dileep's visual pattern recognition ideas and provides a much more lucid picture of his system than either Dileep or Jeff in their papers. If you're interested in neural networks or HTM then definitely check these out!

Despite all of the promising results, Saulius' data highlights a scale problem: memory usage of the model quickly undergoes a combinatorial explosion as it goes up the hierarchy, and each subnode of the network ends up dealing with a large conditional probablity matrix. This is frustrating - if the research scientists can't get this to scale then what hope have spare-time chancers got?

Whenever I get disheartened about such problems I like to re-read Paul Graham's 'A plan for spam' for motivation. Statistical spam filtering techniques were written off by some full time researchers as being ineffective until Graham demonstrated impressive results on his own data. The main reasons for these results were: (1) using a much larger corpus of training data and (2) massaging the input data to improve statistical recognition (e.g. being careful about tokenisation). These days the most advanced statistical filters (e.g. crm114) frequently exceed human accuracy at detecting spam (often by a factor of 10 or so!).

The real brain appears to use employ both of these tactics - the input data is massive and continuous, and e.g. the eye appears to use tricks to assist the statistical analysis of the information coming in (e.g. saccades, fovea, motor driven feedback). Now I'm wondering if a much larger corpus of data could reduce the necessary sophistication in the algorithm.

For me however, the real challenge is in finding an appropriate problem to solve that can drive this stuff (and my enthusiasm) incrementally.

(as usual, all out-of-my-depth caveats apply)