WordPress Form and Comment Spam

As with security in general, escaping the scourge of WordPress form and content spam requires a layered approach. Here is what works.

Databases and Behavioral Anti-Spam

The first step is the one that nowadays works the least well. In the beginning we had Akismet, and things got better, but this is an arms race, and Akismet has not been getting better. In WordPress, this battlefront has basically been ceded (with some exceptions, below). For things like Google's Gmail, this still works fairly well (along with manual rules), a vast majority of the time.

Manual Rules and Keyword Blocking

Manual rules and keyword lists help block a particular subset of spam, namely that manually created by humans, with the purpose of pestering someone to hire a so-called SEO Expert, Web Designer, or Marketing services. By placing these highlighted keywords in the WordPress Admin > Settings > Discussion > Comment Blacklist field, they are not only used as a filter by the WordPress commenting function, but also used by Contact Form 7.

Javascript and/or Session Detection

For the average bot, which is fairly simplistic and won't accept session cookies or have javascript enabled, testing for one or both of these conditions will generally allow those to be ignored. For Comment Spam (on sites that must have comments enabled), the WordPress plugin WP-SpamShield is a fairly effective option. In the future, it might be better to ensure no plugins do PHP Sessions, for performance reasons, but on a moderately busy site this shouldn't be much of a problem.

Honeypot Form Fields

Another way to detect bots is to provide form fields that they see but that humans do not (via CSS). Bots will attempt to fill out these fields, and thereby have their submissions identified and silently rejected. For Contact Form 7, a good choice is the aptly named Contact Form 7 Honeypot.

For WordPress account creation/registration, there is the Registration Honeypot.

Captchas and NoCaptchas

Captchas are another older technology that there are several instances of. Google famously acquired ReCaptcha in 2009 for hundreds of millions of dollars. They introduced a new version in 2014, called NoCaptcha ReCaptcha. And in early 2017, the Invisible ReCaptcha was unveiled, so to speak. ReCaptcha has gone from a human vision solution to a fully automated approach (hence full circle back to the first item above, databases and behavioral).

Personally, I've had so much nonsense from the Google ReCaptchas that either end up making me solve a half-dozen puzzles or more, or that insist on presenting in a human language I cannot read. Google is very bad at both of these issues (producing puzzles for humans, and providing better language support. In both cases the problem could very easily be made much, much less horrible, if simple human factors were taken into account, such as the size of text and providing a consistent language menu item that is labeled to be identified by non-readers of the currently selected language. Both huge failures for a company that should have worked this out a decade ago.

There is one captcha system that actually works well, both for humans (to provide access to them), and for bots (to deny access to them), and that is the Really Simple Captcha. Orginally designed to work with Contact Form 7 -- which it still does -- it also can work well with other forms, and has a basic library that can be used by WordPress developers.

Summary of Anti-Spam Solutions

For contact forms, use:

  • Contact Form 7
  • Contact Form 7 Honeypot
  • Really Simple Captcha
  • Add keywords to the > Settings > Discussion > Blacklist section

For comments in general

  • WP-Spamshield
  • Add keywords to the > Settings > Discussion > Blacklist section

For bot registration denial

  • Registration Honeypot