It's just occurred to me that I never posted about the proximity search capability that I built into JAM*VAT about 3 months ago.

It works by looking for symbols in close proximity. E.g. searching for 'Danny Ayers Blog' yields an answer 'raw', even though the word isn't in the search string. This is because the 'raw' symbol is connected to all of the above terms through the statements in the store.

(N.B. "Danny Ayers Channel" gives a more precise match, because the imported data is an rss10 channel. However the proximity text search is most useful when you don't know the exact vocabulary of the data you're querying.)

The implementation works by simply executing text searches to find statements and then using the resulting statements to filter searches of the remaining terms. It then runs a simple ranking algorithm on the results. The text searches run very fast because of the internal suffix array implementation (and work just as well with substrings).

This works pretty well at work where we have a application management store with ~1.5 million triples. The installation returns sub-second queries for things like 'bond trader server'. ('bond trader' doesn't actually exist btw - that was an example).