Filtering feeds
by Sebastian Benthall
About a week ago Subtraction made a long post complaining about the main problem of feed aggregators:
No matter how much I try to organize it, it’s always in disarray, overflowing with unread posts and encumbered with mothballed feeds. … The whole process frustrates me though, mostly because I feel like I shouldn’t have to do it at all. The software should just do it for me.
These are my reactions to this, roughly in order:
- I feel the pain of feed bloat myself, and know many others that do. It’s another symptom of internet-enabled information explosion.
- It’s amazing that we live in an era when a feeling of entitlement about our interactions with web technology isn’t seen as ridiculous outright. It’s true–it does feel surprising that somebody smart hasn’t solved this problem for everybody yet.
- The reason why it hasn’t been solved yet is probably because it’s a tough problem. It’s not easy to program a computer to know What I Find Interesting…
…or is it? This is, after all, what various web services have fought to do well for us ever since the dawn of the search engine. And the results are pretty good right now. So there must be a good way to solve this problem.
As far as I can tell, there are two successful ways of doing smart filtering-for-people on the internet, both of which are being applied to feeds:
- Using direct social recommendations to let people know what other people find interesting. Digg is probably the best example of this. In the feed domain, Google Reader’s “Shared items” does this.
- Machine learning techniques. Probably the most successful implementation of this on the internet are Bayesian spam filters. The Tao of Mac reports on a writer’s personal experiment to add a Bayesian filter to his feed aggregator. It’s a qualified success.
The most interesting solutions to these kinds of problems are collaborative filtering algorithms that combine both methods. This is why Gmail’s spam filter is so good: it uses the input of its gillions of users to collaborative train its algorithmic filter. StumbleUpon is probably my favorite implementation of this for general web content–although its closed-ness spooks me out.
We’re working on applying collaborative filtering methods to feeds at The Open Planning Project. Specifically, Luke Tucker has been developing Melkjug, an open source collaborative filtering feed aggregator. It’s currently in version 0.2.1. To get involved in the project, check out the Melkjug Project page on OpenPlans.org.