Something he wrote sparked a memory.
Instead of having it crawl blogs, I’d have it download and index RSS feeds. This should be cheaper than crawling pages, and it ensures that it skips indexing page junk (navigation and so on).
In 2005 or so, for 9rules I had built this exact feature. I scheduled a script to run every hour or so to poll all of the blogs in the 9rules Network (which, at its peak was hundreds of web sites). I did so before ever knowing about scaling something like this. Today it would be so much easier and cost effective to build something like this that could scale to hundreds of thousands of feeds without much effort or funding.
Like Brent I miss the days of Technorati and its ilk. It gave us a window into what people were writing about. It gave us a back channel to people’s thoughts on topics we enjoyed. These days, I suppose, you can search for “Star Wars” on Twitter to see what people are saying about last week’s announcements. But it doesn’t feel the same.
Also, these days, I don’t even know what a blog is! Is The Verge not a blog? Is the WSJ not kinda-sorta a blog? Perhaps that is why even Google removed the blog-only search. Because so many things are blogs now.
It is fun to think about. But, like Brent I too am busy with side projects.