RSS-Spider

Development, Ideas, Issues, problems, ßetas and what not…

Hot Words is DOA

Filed under: What Not... — Dave at 10:34 pm on Tuesday, January 23, 2007

I had to turn off the Hot Words option on the site last week.  There are just to many news items in the database now and the script that I had written to parse out the most popular words of the day in 5 different languages was bringing the server to it’s knees.  Ok… fess up time.  The script was poorly written.  Once I get a chunk of time I’ll rewrite it and optimize the php code.  I’ll also have to figure out how to work Sphinx into it.

Daily bandwidth is way up. ISP called me today to tell me that I have pay $50 more a month. This is the second time in three months now that host costs have gone up.  At this rate I’ll be broke soon.  Don’t want to throttle but I might be forced to. :(

Search for you site in RSS-Spider

Filed under: What Not... — Dave at 12:03 am on Wednesday, December 6, 2006

Have you submitted your link to RSS-Spider.com/fsb.php ? Do you know if RSS-Spider is indexing your RSS content? Do you know how many items from your site are in RSS Spiders’ database?  Well you can find this all out fairly quickly by searching for “site:www.YOUR-DOMAIN-HERE.com” of course without the quotes…

What else can you do? Well if you want to search for all the sites that have the word “cleveland” in their domain you can do that too… simply search for “site:cleveland” and all the articles from all the sites with cleveland in their domain will pop up.

The Long Tail

Filed under: What Not... — Dave at 10:45 pm on Tuesday, August 22, 2006

I’ve just read a book called The Long Tail which in a nut shell explains how online properties like iTunes, Amazon & others are selling more very niche items. The term long tail comes from the what the graph looks like when you graph out the sales of say Music or Books online. With the every expanding selection of items online the tail gets longer and longer. Take for example Netflix. Netflix has thousands upon thousands of DVDs for rent on it’s site. A handful of those DVD’s are what we commonly think of as blockbuster videos… ie X-Men, Harry Potter, Capote etc. However, these releases represent a small portion of what Netflix is renting. There is a HUGE amount of movies which never made it to your local cinema plex or to the local Hollywood Video store. They never made it there becuase of a lack of shelf space. Shelf space is hugely expensive and there for reserved to what the video store “thinks” it can make the most money on. Whats on the shelf represents a FRACTION (maybe 2%) of whats really out on the market in any given year. The rest of the non economically successful DVDs never show up in your local video store. However, Netflix doesn’t have to worry about shelf space or store overhead or other expensive things, so they can afford to buy & rent out movies which never saw the inside of a thearter or Blockbuster Video.

Now conventional wisdom would say that block buster movies are block busters becuase they’re GREAT movies and that if a movie wasn’t a block buster then it’s not that good. Totally wrong. A movie is a block buster becuase some studio execs gave it the “green light” to get promoted and pushed out onto the movie going public. There are only about 100 hollywood movies that come out each year. Each one of these has a budget for promotion & theather runs. A movie to be commercially successful must pull in several thousand people during its run at a theather. Movies that don’t or movies that studio execs don’t think will have the draw dont get promoted, or even made by the big studios. Futher more movies which get standing ovations at Sundance might not go any further than that.

Now the long tail comes into play with a huge selection and a good post filter. One of Netflix’s features that I like the most is the RIFL or “recomended if you liked” option. After initially signing up for Netflix I was told that I should go though an rate different movies on a scale of 1 to 10. From that Netflix would be able to pick movies that it thinks I might like. The more movies I rate the more likely Netflix is to pick something I like.

SOOOOOOOOO Whats this got to do with RSS-SPider? Well one of the first things I noticed shortly after launching the site was the long tail effect in searches. A lot of people look for the same top 100 things, but an even greater amount of people are looking for a lot of other different things.

Note had I not installed the newest version of jpgraph on this server I’d be able to graph something out.  However, it’s looking for a newer version of php and we’re not planning on upgraded for a few weeks :(

Old news is bad news… growing pains

Filed under: What Not... — Dave at 11:58 am on Sunday, August 20, 2006

Three months of feeds or 480201 items were purged yesterday.  It took little less than 20 minutes to nuke out nearly half a million feeds dating back to March 1st and reindex the database with Sphinx.  Now there’s a little more room for new items and server should stop emailing me space alerts.

Need a new hard drive and server soon.  We’ve grown very very fast.

Built for Speed… new Indexing engine goes online…

Filed under: Development — Dave at 10:22 pm on Saturday, February 11, 2006

Over the past two weeks there was a major drop in the speed at which searches were being returned. The MySql database hit a line in the sand somewhere and once crossed search speed suffered. At the time of this writing the database has over 5 million articles pulled from various RSS. The FULL_TEXT search has collapsed and searches for simple one word searchs like Clevealnd were taking 300+ seconds to return. Frankly I’m still amazed at the amount of page views we were getting at this time but looking at the search log I can see many people came in from the same IP address 4 or 5 times within a minute looking for the same thing. This says to me that they thought the site was slow or didn’t accept their querey so they clicked search again only to have to wait 4+ minutes! BAH!

As of 2006-02-11 20:01:02 RSS-Spider is now being powered by a new Full Text Index server called Sphinx. Searches that once took 300+ seconds to do now take under a second! Sphinx was simple to install and I’m seriously impressed with the overall speed gain!!! http://www.shodan.ru/projects/sphinx for more information!

« Previous PageNext Page »