In Praise of Akismet 0

The spam nightmare continues, but thanks to Akismet, it’s been reduced to a minor nuisance on this blog.

Akismet was created by Matt Mullenweg of WordPress. Thankfully, the folks who run WordPress didn’t keep this to themselves, but opened it up to all types of blogs—and even other applications, such as forums. Anything that accepts user comments should be using this.

Here’s how it works. When a comment is received on your blog, or a post on your forum, or whatever, your software first submits the comment to Akismet via it’s open API. Akismet does its magic and tells you whether the comment is spam or not. If Akismet blesses it, then your software goes ahead and posts the comment. If not, it puts it in a holding pen, where you can double-check that it is really spam before deleting it. If you’re using WordPress, you can just download the plug-in. If you’re using Mephisto (the Ruby on Rails application that runs this blog), then it is built in. There’s a wide assortment of libraries and plug-ins for other platforms as well.

Understandably, the Akismet folks don’t disclose just how they decide what’s spam, but in my experience, it has been 100% accurate. They do have a vast volume of messages to learn from: since the service started, they’ve detected a staggering 643,803,210 spam posts, and they see millions a day. A revolting 94% of all posts submitted are spam.

The spammers are getting a little more clever, but Akismet is one step ahead of them. The 20 or so posts a day I’ve been getting for male dysfunction remedies are linking back not to the site of any spam company, but instead are linking to posts on other blogs and forums where the spam has been posted. So the link is to a legitimate place, which is unknowingly hosting the spam message. These posts come in bursts of three to five, each with a different email name attached and with a slight variation of the text, but clearly they all come from the same place. Akismet has gotten them all, so you never see them, and all I have to do is do a quick daily scan of the quarantined posts and click “delete all.”

Once you’ve installed a plug-in or integrated one of the libraries, you need to get an API key. This identifies each user of the service and helps the WordPress folks monitor use of the system and control abuse. An API key is free for non-commercial bloggers (which they define as anyone making less than $500/month from their blog). If you’re a “pro blogger,” you can get a key for just $5/month, which is well worth it. Enterprise subscriptions start at $50/month for 5 blogs. Non-profits can use the service for free if they provide some back-links to help promote the service, or for half-off the enterprise prices if not.

With this service available, there’s really no excuse, other than the need to implement the API interface, for any software to be posting spam. If we can eliminate the ability to post spam, we can take the upside out of this dirty business and send the scum who post spam comments off to some other misguided pursuit.

The Spreading Spam Scourge 0

If you thought email spam was bad enough, you should try running a web site with a user forum and a mailing list.

At BoatingSF, I have both, and they’ve become increasingly painful.

To post to the Forum, users have to register, which requires filling out one of those “human detectors” that makes you enter some oddly presented characters. (These things are called CAPTCAs, by the way, an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart.) Then they have to click a link in an acknowledgment email. So every post either comes from an actual human, or from some pretty sophisticated automation.

Even so, I’m getting 3-4 spam posts a day. I suspect this is coming from some low-grade humans using some automation software, where they fill in the bits that are hard for machines to do. Or maybe there’s now software that is successfully doing OCR on the CAPTCHA, or the code has somehow been bypassed. In any case, this forum is getting far more spam posts than legitimate ones (it has been slow getting this forum going), so yesterday I reluctantly made all topics moderated. Now, I can just ignore the requests to approve the posts, instead of having to delete them, and they don’t appear for even those few hours before I get around to deleting them. So, alas, the legitimate posters now pay the price in a posting delay.

I also have an option for visitors to the site to sign up for an email list. I send out a newsletter about once a month. In reviewing the list of subscribers, I found about 100 names that were clearly spammers—email addresses that were random sequences of characters, and that now bounce. Why would they bother to do this? Because the confirmation email has my return address, because I want people to be able to communicate with me and I don’t expect people to read email that doesn’t come from a valid email address. So apparently the spammers sign up for the list to capture the return address. That address, which I have been careful never to publish anywhere else, now gets about 50 spam messages a day. Fortunately, Gmail (which hosts the mail service for my domain) is very good at trapping them.

Over at the Ruby on Rails Wiki, the once-valuable content has been overrun with spam posts that completely replace the contents of a page. This site could be made a lot more spam-resistant, but apparently the administrators have been too busy with other stuff to do so. The value of the wiki, which was at one point a key source of Rails information, has been eviscerated by the spammers.

The open nature of the Internet allows a very small minority of dishonest, unethical folks to cause a lot of hassle for everyone else. I guess it is like living in a very big city—you simply have to keep your doors locked if you don’t want stuff stolen. But surely there’s more than can be done with technology to stop these parasites.

For further reading: