Hierarchical Temporal Memory (HTM) Resources
I've been thinking about AI again recently.
This time I got motivated by watching a couple of videos of Jeff Hawkins talking about his HTM (Hierarchical Temporal memory) ideas. Actually this has been a common theme for me recently - motivated speakers on podcasts and videos are much more likely to get me interested in something than the written word.
Anyway, a look through the OnIntelligence forum brought up a couple of papers I hadn't seen before: Saulius Garalevicius has been experimenting with Jeff and Dileep's visual pattern recognition ideas and provides a much more lucid picture of his system than either Dileep or Jeff in their papers. If you're interested in neural networks or HTM then definitely check these out!
Despite all of the promising results, Saulius' data highlights a scale problem: memory usage of the model quickly undergoes a combinatorial explosion as it goes up the hierarchy, and each subnode of the network ends up dealing with a large conditional probablity matrix. This is frustrating - if the research scientists can't get this to scale then what hope have spare-time chancers got?
Whenever I get disheartened about such problems I like to re-read Paul Graham's 'A plan for spam' for motivation. Statistical spam filtering techniques were written off by some full time researchers as being ineffective until Graham demonstrated impressive results on his own data. The main reasons for these results were: (1) using a much larger corpus of training data and (2) massaging the input data to improve statistical recognition (e.g. being careful about tokenisation). These days the most advanced statistical filters (e.g. crm114) frequently exceed human accuracy at detecting spam (often by a factor of 10 or so!).
The real brain appears to use employ both of these tactics - the input data is massive and continuous, and e.g. the eye appears to use tricks to assist the statistical analysis of the information coming in (e.g. saccades, fovea, motor driven feedback). Now I'm wondering if a much larger corpus of data could reduce the necessary sophistication in the algorithm.
For me however, the real challenge is in finding an appropriate problem to solve that can drive this stuff (and my enthusiasm) incrementally.
(as usual, all out-of-my-depth caveats apply)