Monday, June 7, 2010...11:21 pm

Comment spam raises its game

Jump to Comments

As a – fairly – regular blogger, I have to deal with my share of comment spam.

The WordPress-owned spam filter Akismet does a pretty good job of filtering out spam comments – despite some reservations by others in the blogging fraternity. So, generally I’m not that bothered by it.

But when I was emptying the spam queue for my stop-motion animation blog I came across a couple of comments that seemed totally unspammy. So much so that I was surprised to see them there.

Having duly designated them as ‘not spam’, I trashed the rest and went onto the site to check they were OK. I confess I was even a little worried that the eager authors might have been disappointed with the delay in publication and given up on the blog in disgust. (It’s not a heavily-trafficked site – this would hurt).

Not so. Because when I saw the comments in place, I realised they were simply copies of existing comments. Someone had clearly just harvested existing comments, added them to their spam details and reposted them to the site.

Frankly, I’m surprised it took me so long to recognise them – given the relatively small number of commenters I have, and how much I treasure and pore over their submissions to me (go on – become part of the family). I’ve just found the same here on Freelance Unbound, too – which I’d never seen before.

Is this actually new? I’ve certainly never come across it before. I did worry that it would be a game changer in the battle between publishers and spammers.

The vast majority of spam is easy to spot, even to the untrained reader, as it is either [a] garbled and/or filled with references to pornography or drugs; or [b] completely anodyne and unrelated to the post it’s attached to (“Great post! I will read your blog forever!”). But stealing real comments and using them as a spammer’s Trojan Horse is another matter.

Luckily, Akismet can easily spot repeated comments from Freelance Unbound. After all, it’s easy to search the site’s existing database for repeated content.

But what if spammers harvest real, meaningful comments from other people’s sites and submit them here? Can a spam filter monitor the whole of the web for repeated content? And what about software that can subtly rework content so it passes a plagiarism test?

If any bloggers have more experience with this kind of thing, I’d be more than eager to hear about it…

6 Comments

  • I went through this when comments on my site began to pick up. The spam count shot through the roof, and none of my initial solutions could stop it all.

    I tried a good selection of spam-filtering plugins, with varying degrees of success. The ones that worked best also tended to ‘censor’ real comments from time to time… which was unacceptable (like you, I value my commenters!)

    In the end, I’m afraid I’ve fallen back on CAPTCHA, which does work almost 100% (although the spammers and starting to figure out how to beat that, too. Watch this space). Hate having to ask folks to fill in an extra box, but it’s the best compromise I could find.

  • How interesting. Akismet has only pushed one or two genuine comments into the spam folder in the past year – and those were from people who had legitimately used spam-type words in the body of their comments (such as “internet marketing”, or “making real money”).

    As for spam making it onto the blog – I can only remember a very few instances of spam making it as far as the moderation queue, let alone being published.

    Perhaps the relatively closed family of commenters is a help here…

  • Agreed. Install ‘SI captcha’. It stops bots dead in their tracks. All three of my sites use it, and we never get spam.

    If you get a persistent human spammer, activate the rule in WP that only allows one link per comment without authorisation.

    I’ve linked you on my (work in progress) http://keneakins.com

  • Dang. I missed an apostrophe there.

    ;)

  • The trouble is, I don’t really want to use a captcha, as I worry that potential commenters will be put off. I find them a bit tiresome – so I think maybe others do too. Though this may be over-stressing about the unimportant…

  • People don’t seem to mind to be honest. The SI captcha is pretty reasonable compared to most.

    It really does kill spam dead as well.