RSS-Spider

Development, Ideas, Issues, problems, ßetas and what not…

The red ones taste like burning…

Filed under: Issues, What Not... — Dave at 10:18 pm on Tuesday, August 19, 2008



Actually they all taste like burning…

SPLOGS SPLOGS SPLOGS… Why are we indexing crap?

Filed under: What Not... — Dave at 12:52 am on Saturday, April 7, 2007

In a recent post on DIGG.com I read about a study where someone figured out that about 75% of all blogs being hosted at BlogSpot where spam blogs. You can read the article at : http://www.infoniac.com/hi-tech/google-blogs-spam.html Anyhow… it got me to thinking… we’ve put a few things in to block spammers on this site, however, since we’re pulling in RSS feeds from all around the web, how many splogs have we indexed? Ugh… what a mess. Searching for “debt reduction” came back with 1000+ results instantly and most of them were within 3 days old. 1000 by the way is the ceiling of results that Sphinx is setup to return. Searching for “credit-card-debt” came up with the same results. So I’ve decided that Blogspot and all the other domains listed in the article above are going to be put on the “no spider” list & ban list effective Monday 4/9.

Hot Words is DOA

Filed under: What Not... — Dave at 10:34 pm on Tuesday, January 23, 2007

I had to turn off the Hot Words option on the site last week.  There are just to many news items in the database now and the script that I had written to parse out the most popular words of the day in 5 different languages was bringing the server to it’s knees.  Ok… fess up time.  The script was poorly written.  Once I get a chunk of time I’ll rewrite it and optimize the php code.  I’ll also have to figure out how to work Sphinx into it.

Daily bandwidth is way up. ISP called me today to tell me that I have pay $50 more a month. This is the second time in three months now that host costs have gone up.  At this rate I’ll be broke soon.  Don’t want to throttle but I might be forced to. :(

Search for you site in RSS-Spider

Filed under: What Not... — Dave at 12:03 am on Wednesday, December 6, 2006

Have you submitted your link to RSS-Spider.com/fsb.php ? Do you know if RSS-Spider is indexing your RSS content? Do you know how many items from your site are in RSS Spiders’ database?  Well you can find this all out fairly quickly by searching for “site:www.YOUR-DOMAIN-HERE.com” of course without the quotes…

What else can you do? Well if you want to search for all the sites that have the word “cleveland” in their domain you can do that too… simply search for “site:cleveland” and all the articles from all the sites with cleveland in their domain will pop up.

The Long Tail

Filed under: What Not... — Dave at 10:45 pm on Tuesday, August 22, 2006

I’ve just read a book called The Long Tail which in a nut shell explains how online properties like iTunes, Amazon & others are selling more very niche items. The term long tail comes from the what the graph looks like when you graph out the sales of say Music or Books online. With the every expanding selection of items online the tail gets longer and longer. Take for example Netflix. Netflix has thousands upon thousands of DVDs for rent on it’s site. A handful of those DVD’s are what we commonly think of as blockbuster videos… ie X-Men, Harry Potter, Capote etc. However, these releases represent a small portion of what Netflix is renting. There is a HUGE amount of movies which never made it to your local cinema plex or to the local Hollywood Video store. They never made it there becuase of a lack of shelf space. Shelf space is hugely expensive and there for reserved to what the video store “thinks” it can make the most money on. Whats on the shelf represents a FRACTION (maybe 2%) of whats really out on the market in any given year. The rest of the non economically successful DVDs never show up in your local video store. However, Netflix doesn’t have to worry about shelf space or store overhead or other expensive things, so they can afford to buy & rent out movies which never saw the inside of a thearter or Blockbuster Video.

Now conventional wisdom would say that block buster movies are block busters becuase they’re GREAT movies and that if a movie wasn’t a block buster then it’s not that good. Totally wrong. A movie is a block buster becuase some studio execs gave it the “green light” to get promoted and pushed out onto the movie going public. There are only about 100 hollywood movies that come out each year. Each one of these has a budget for promotion & theather runs. A movie to be commercially successful must pull in several thousand people during its run at a theather. Movies that don’t or movies that studio execs don’t think will have the draw dont get promoted, or even made by the big studios. Futher more movies which get standing ovations at Sundance might not go any further than that.

Now the long tail comes into play with a huge selection and a good post filter. One of Netflix’s features that I like the most is the RIFL or “recomended if you liked” option. After initially signing up for Netflix I was told that I should go though an rate different movies on a scale of 1 to 10. From that Netflix would be able to pick movies that it thinks I might like. The more movies I rate the more likely Netflix is to pick something I like.

SOOOOOOOOO Whats this got to do with RSS-SPider? Well one of the first things I noticed shortly after launching the site was the long tail effect in searches. A lot of people look for the same top 100 things, but an even greater amount of people are looking for a lot of other different things.

Note had I not installed the newest version of jpgraph on this server I’d be able to graph something out.  However, it’s looking for a newer version of php and we’re not planning on upgraded for a few weeks :(

Next Page »