On blog search engines

Brent Simmons has been reminiscing about blog search engines and writing down some ideas for how one could be made today.

Something he wrote sparked a memory.

Instead of having it crawl blogs, I’d have it download and index RSS feeds. This should be cheaper than crawling pages, and it ensures that it skips indexing page junk (navigation and so on).

In 2005 or so, for 9rules I had built this exact feature. I scheduled a script to run every hour or so to poll all of the blogs in the 9rules Network (which, at its peak was hundreds of web sites). I did so before ever knowing about scaling something like this. Today it would be so much easier and cost effective to build something like this that could scale to hundreds of thousands of feeds without much effort or funding.

Like Brent I miss the days of Technorati and its ilk. It gave us a window into what people were writing about. It gave us a back channel to people’s thoughts on topics we enjoyed. These days, I suppose, you can search for “Star Wars” on Twitter to see what people are saying about last week’s announcements. But it doesn’t feel the same.

Also, these days, I don’t even know what a blog is! Is The Verge not a blog? Is the WSJ not kinda-sorta a blog? Perhaps that is why even Google removed the blog-only search. Because so many things are blogs now.

It is fun to think about. But, like Brent I too am busy with side projects.

    @cdevroe I would keep the “blog” definition as something separate from the corporate world; The Verge, WSJ, etc are all publications, journals even. “Blog” should be kept for the amateur/independent side of online publishing.

    @simonwoods So can one make money from their blog without being recategorized?

    @cdevroe Yeah for sure. Considering the options — Patreon, ko-fi, PayPal, advertising etc — I’d say independent blogging can stay as such now more than ever. The more stakeholders you introduce, the less independent you become.

    I know technically many non-independent groups run their publications on what we think of as blogs but I think this is a case where the technical details are important but also distinct from philosophical and business approaches. It definitely becomes murky if we get into a situation where an independent person or group makes decisions to do with advertising, removing feeds, and so on.

    @simonwoods @cdevroe The Verge, WSJ are already well known. A blog search engine is to find/discover those lesser known blogs and posts, IMHO. So I agree with Simon. An RSS feed engine helps find fresh individual posts in near real time, but it helps if the user can dial back the results 1 day, 2days, 1 week etc. You have to be able to parse the content of individual feeds which is what gives an RSS search engine a huge advantage over a simple directory. A simple directory has a hard time categorizing blogs that have content about a wide range of subjects.

    Other considerations: Human editing/curation – a human editor should have final say on a blog’s inclusion. This weeds out the made for Adsense blogs. Long term managment: bloggers move to new platforms and the URL of the feed may change in that move, therefore it is handy if a feed goes dead to have a crawler that will check the blog’s domain and try to detect a new feed URL. Otherwise you end up with a lot of dead wood in the index over time.

    The other problem is getting people to use you blog search engine. That’s hard to do these days. Once you have a reasonable index of feeds I would suggest working with makers of feed readers to offer your search from within the reader.

    @simonwoods An interesting topic to be sure. I agree it can’t be a question of underlying technology. In some cases blogs are simply static HTML files. But I’m unsure it is simply a question of business model either. Many bloggers make money with ads.

    I think in the spirit of a blog search engine what you’d be looking for in the decision process to add a blog would be that the site is run by a single person. Making it a personal blog. Sure, that person may hire someone from time-to-time to make an update to their site’s technology, code, back ups, etc. but by-and-large the blog’s content would be published by a single person.

    Then again, this would eliminate great blogs like Kottke since he often has other authors.

    Our rules for the 9rules blog were more around the layout and consumption of the content; is it a reverse chronologic index of posts? Is there an RSS feed? Etc. But I don’t think that model fits for the entire Internet.




Leave a Reply

Your email address will not be published.