Referrer spam, comment spam, trackback spam, spam, spam, spam, spam, hams, eggs, and spam

Date: Wed Nov 14 2007
Web site operators are subject to different sorts of spam than the regular public is. Spam isn't just the unsolicited email that gets all the attention, just as the spam talked of on this page is not the packaged meat product by Hormel Foods. ([w:spam|SPAM on the Wikipedia])

Referrer SPAM

First one needs to understand what the referer is. Every web page request includes data saying which page refered the user to visit the given page. This happens when someone clicks a link on a web page, the page they end up visit is told which page the refered them. But if someone types the URL in the location bar, then there's no referer.

It's very helpful in studying your website traffic to look at the referers, so that you understand where your visitors are coming from.

Unfortunately the referer is very easy to forge. This has resulted in spammers making requests from thousands of web sites, each request with a forged referer pointing to their own web site.

What's the advantage? It's twofold:

First, it causes curious website administrators to visit the site that did the spamming. Suppose you're reviewing your website statistics and see a link from a site with a URL talking about poker, or hot girls, or whatnot, why would you get a referel from such a site? Of course you're going to be curious, why is that site referring to my site. That makes you click on the link and go there, at which time you've become a new visitor to that site.

Second, sometimes the website statistics reports get published somewhere that is reachable by a search engine. If the search engines scan your website statistics, and see referer spam, the search engine will count that spammed link just as if it was a "real" link. The more links the better, right?

A nice overview article on referer spam: http://www.spywareinfo.com/articles/referer_spam/

Detailed article on referer spam: http://www.kuro5hin.org/story/2005/2/14/02558/3376

Another detailed article: http://www.abcseo.com/papers/referrer-spam.htm

An apache specific method to block referer spam: http://www.ilovejackdaniels.com/apache/block-referrer-spam/

http://weblogtoolscollection.com/archives/2004/04/12/referrer-spam-removal/

Blog posting on referer spam: http://www.unix-girl.com/blog/archives/000264.html

Comment or Trackback spam

This refers to features present in the newer kind of websites, ones using some kind of content management software. A blog site or other kind of content manager driven website contains features allowing people to make comments on the web pages, or to submit "trackback" links.

Comments and trackbacks can both be entered by software, hence it can be an efficient form of spamming zillions of web sites.

You can experience comments on the web page you're reading right now. At the bottom of this article will be a link allowing you to leave a comment. Click on the link (registration required) and type some text and hit the submit button. You can enter a link to your own web site just as easily as any other text. Hence, when the search engines scan my site they might see the link you entered, which then counts as another link to your site.

Trackbacks are even simpler. You'll see on this page a section saying "Trackback URL for this page" followed by a URL. Most blogging software allow you to send a trackback, which is very useful for web site operators to notify each other when they write an article about someone elses article. And when the trackback is received by the web site software, the software puts a link somewhere pointing to the page that sent the trackback.

That gives the clear motive for both comment and trackback spam. That it creates more links to sites.

A few months ago the search engines got together and came up with a solution to comment and trackback spam. They decided that whenever they see a link that uses the rel=nofollow attribute, the search engine will not count that link for popularity. If implemented this completely cuts the effectiveness of comment and trackback. Unfortunately it's not implemented in all blog or content management systems.

An article on ways to combat comment spam: http://www.sixapart.com/pronet/comment_spam

Referer spam and the rel=nofollow attribute: http://virtuelvis.com/archives/2005/03/killing-referrer-spam-forever

http://kalsey.com/2003/11/comment_spam_manifesto/

Comment spam clearinghouse: http://www.jayallen.org/comment_spam/

Combatting website spam with wordpress: http://codex.wordpress.org/Combating_Comment_Spam

Article on fighting trackback spam in movable type: http://www.elise.com/mt/archives/000577trackback_spam.php

Article on fighting trackback spam in wordpress: http://blog.mytechaid.com/archives/2005/03/09/wordpress-trackback-spam-solution/ ... and another: http://www.bloggingpro.com/archives/2005/01/05/fighting-trackback-spam/

A plugin for wordpress to aid fighting spam postings: http://unknowngenius.com/blog/wordpress/spam-karma

Another wordpress module -- http://idli.cs.rice.edu/~dsandler/trackback/trackback-validator-plugin/ -- which checks that the referring page includes a link to the page being trackback'd. This simple check should block most spammers, because they'd have to generate a unique page for each trackback they send.

An individual blogger comparing typepad with wordpress for blocking trackback spam: http://blogging.typepad.com/how_to_blog/2005/07/fighting_trackb.html

A javascript/AJAX method for eliminating trackback spam: http://www.i-marco.nl/weblog/pivot/entry.php?id=286