RSS-Spider

Development, Ideas, Issues, problems, ßetas and what not…

wordpress hack <u style=’display:none’>

Filed under: Problems — Dave at 1:21 am on Saturday, April 26, 2008

Checking email for this site today I ran across this email from Google Search Quality. At first I thought it was a spam seeing as it was filled with crap about viagra & calliass but was shocked to find that this crap WAS on this blog. Well it seems an older version of Wordpress that I was running has a venerability allowing someone to update your theme files and post all sorts of CRAP into it with links leading back to thier spammy sites. Some one did this since I am a lazy sysadmin and didn’t update wordpress. Broke rule number 2 on the Google webmaster security check list…

Shame on me…

Dear site owner or webmaster of rss-spider.com/blog,

While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guidelines.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.
(Read on …)

Answering the questions I get at least once a week… “How can I setup an RSS feed for my site?”

Filed under: Development — Dave at 12:27 am on Thursday, June 14, 2007

This step by step comes from Design World Online and is reprinted with permission. The orginal document is located at http://www.designworldonline.com/ftp/dmm_pdf/DesignWorld_HowToRSS.pdf
FEED / BLOG CREATION

STEP 1. Find Service or Application
Design World recommends TypePad (www.typepad.com). It is very easy to set up a free 30-day trial and get started with your communication. If getting your IT department involved is required, you can even point these services to your own domain for seamless integration. If you have a Web site, blog, audio/video content or even photos, you can offer a feed of your content as an option. If you are using a popular blogging platform or publishing tool like TypePad, Wordpress or Blogger, you likely publish a feed automatically. There are also tools on the market that can help transform traditional web content into the right format for distribution. Simply creating an XML version of your content allows Aggregators the ability to read, but this entails some knowledge of XML syntax. Another method is PC-based software that allows blogging with associated feeds to be automatically published to a specific website location.

STEP 2. Enter Data
The more frequent the better! Your readers and search engines like constantly updated content.

STEP 3. (Optional) Enhance your Feed
There are services like FeedBurner (www.feedburner.com) that allow you to track statistics on your feed that include subscribers, hits and other good stuff.

STEP 4. Required! Tell us about your Feed*
Once your up and running, go to http://www.rss-spider.com/fsb.php and submit your feed address so we can subscribe to your feed and keep apprised of your news automatically. You post and we redistribute immediately. You gain the exposure of the RSS-Spider with no hassle.

* Edited out Design World’s email address since 99.999% of user submitted feeds have no relation to design engineering.

SPLOGS SPLOGS SPLOGS… Why are we indexing crap?

Filed under: What Not... — Dave at 12:52 am on Saturday, April 7, 2007

In a recent post on DIGG.com I read about a study where someone figured out that about 75% of all blogs being hosted at BlogSpot where spam blogs. You can read the article at : http://www.infoniac.com/hi-tech/google-blogs-spam.html Anyhow… it got me to thinking… we’ve put a few things in to block spammers on this site, however, since we’re pulling in RSS feeds from all around the web, how many splogs have we indexed? Ugh… what a mess. Searching for “debt reduction” came back with 1000+ results instantly and most of them were within 3 days old. 1000 by the way is the ceiling of results that Sphinx is setup to return. Searching for “credit-card-debt” came up with the same results. So I’ve decided that Blogspot and all the other domains listed in the article above are going to be put on the “no spider” list & ban list effective Monday 4/9.

DIY RSS-Spider clones

Filed under: Development — Dave at 12:23 am on Sunday, January 28, 2007

This thread has been moved to BuildYourOwnSearchEngine.net

I’ve been getting a ton of emails from people all over the world asking me to share how I built RSS Spider.com… several people even offered to “partner” with me if I would build them a version of RSS-Spider for their language. Well folks. Sorry… I don’t have the time to build everyone an RSS Spider, however, I will over the course of the next few months post the basics of a Do It Yourself RSS-Spider clone. But first lets list some requirements.

(Read on …)

Hot Words is DOA

Filed under: What Not... — Dave at 10:34 pm on Tuesday, January 23, 2007

I had to turn off the Hot Words option on the site last week.  There are just to many news items in the database now and the script that I had written to parse out the most popular words of the day in 5 different languages was bringing the server to it’s knees.  Ok… fess up time.  The script was poorly written.  Once I get a chunk of time I’ll rewrite it and optimize the php code.  I’ll also have to figure out how to work Sphinx into it.

Daily bandwidth is way up. ISP called me today to tell me that I have pay $50 more a month. This is the second time in three months now that host costs have gone up.  At this rate I’ll be broke soon.  Don’t want to throttle but I might be forced to. :(

Next Page »