Jon Udells Investigating A Bayesian RSS Categorizer Bayesian S

Jon Udell's investigating a Bayesian RSS categorizer. Bayesian spamfilters correlate the probability of words (such as "v1agra" or "ontology") appearing in an email message; if the aggregate probability from all the words in an email exceeds a certain threshold, the email is put in a specific folder. The blog correlary (I think) would be analyzing RSS items as to whether they're deleted, opened, and opened-and-clicked-through; over time, such a system should be able to assign a probability to the likelihood that a new RSS item will be of interest to you (and presumably you'd sort by that in your aggregator).

Jon appears to be doing something dangerously more ambitious, which is creating a Bayesian categorizer that assigns Jon-meaningful categories (email, collaboration, family, etc.) to items. I say "dangerously more ambitious" because Jon's approach would seem to require a lot of supervision, while the genius of Bayesian spam-filtering is that pressing a button marked "Delete as spam" is no more onerous than deleting the spam in the first place. Similarly, a Bayesian RSS aggregator that just attempted to categorize "Will this item be read, will this item be clicked-through, will this item be deleted without pause?" requires no more supervision than what is natural to the task of RSS browsing.

Jon Udells Investigating A Bayesian RSS Categorizer Bayesian S

blogroll

social

Categories