RSS-Spider

Development, Ideas, Issues, problems, ßetas and what not…

SPLOGS SPLOGS SPLOGS… Why are we indexing crap?

Filed under: What Not... — Dave at 12:52 am on Saturday, April 7, 2007

In a recent post on DIGG.com I read about a study where someone figured out that about 75% of all blogs being hosted at BlogSpot where spam blogs. You can read the article at : http://www.infoniac.com/hi-tech/google-blogs-spam.html Anyhow… it got me to thinking… we’ve put a few things in to block spammers on this site, however, since we’re pulling in RSS feeds from all around the web, how many splogs have we indexed? Ugh… what a mess. Searching for “debt reduction” came back with 1000+ results instantly and most of them were within 3 days old. 1000 by the way is the ceiling of results that Sphinx is setup to return. Searching for “credit-card-debt” came up with the same results. So I’ve decided that Blogspot and all the other domains listed in the article above are going to be put on the “no spider” list & ban list effective Monday 4/9.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>