If you host any website that has a contact form embedded on it, chances are you are familiar with spam bots filling out your form with unhelpful, or downright harmful information. This can be quite a headache (especially for marketers who are collecting prospective client information or gathering registrations for an event). Not to mention, the bad bot information can infiltrate your CRM data causing members of your team to spend hours cleaning up the unwanted data and inflate your website’s form conversion rate. At worst, spam bots present a security threat to your website, which can harm your site visitors and members of your organization.
Luckily, there are plenty of ways to cut down on this unwanted traffic which saves your team time and allows them to focus on legitimate leads. But, each webform (and the spam it gets) is unique. To better defend yourself, it is best to get some insight into the possible attackers. Let's take a moment to think like a spam bot!
What are spam bots?
Cloudflare defines a bot as a software application programmed to do certain (often repetitive) tasks, which it can complete much faster than humans. In short, a bot is a program; typically, one that is written to do a specific task as quickly and as many times as possible to accomplish its goal. Note also that not all bots are bad! The web crawlers that Google and Bing use to index content for their search engines are probably some of the most prolific bots ever made. Spam bots are those bots that are written for the express purpose of sending information (frequently, via web forms).
Benign or not, bots come in all shapes and sizes, and their goals can vary significantly. To address only the nefarious bots, you need to understand what threat a bot might present.
What is a spam bot’s goal?
Spam bots are created for various purposes, but they typically come in three types: message-oriented, DoS, and reconnaissance.
Some spam bots are written expressly to promote a specific message. They might be focused on getting out a political message or a form of guerilla marketing. These bots are looking to get the largest possible audience (maximizing eyeballs/engagement), and they are also likely to have a common thread running through each submission (e.g., a specific phone number, product name, policy, or candidate that is being promoted). These bots are unlikely to hammer your server (take your site down), because doing so will likely result in stopping the spread of their message. You are more likely to see these bots wait some amount of time between submissions to avoid putting a massive strain on your infrastructure.
These are likely the least threatening kinds of bots (though quite annoying), and they are the most common.
The example above is of a bot that is targeting someone’s web form with a message-oriented attack. They seem to be promoting some sort of “hack tool” and repeatedly spam a link to their site.
2. Denial of Service (DoS)
DoS bots aren’t interested in getting a specific message out, they’re aiming to take down a website. Specifically, they are hoping to maximize strain on a server’s resources. They will tend to do this by submitting a significant amount of data (e.g., filling up form fields with the maximum number of allowed characters), by submitting as many requests as possible (hammering), or both. Doing so means they are taking up the maximum amount of bandwidth in the hopes that your server can’t keep up. Because these bots aren’t interested (for the most part) in you seeing what they submit, they may even submit plain gibberish (or include exorbitant amounts of whitespace).
Here’s an example of what kind of content a DoS bot would submit through a web form, as seen recently in my own inbox.
See all the gibberish that the bot submitted? Not exactly useful for a sales or marketing team trying to collect valid customer information. Also note that the Honeypot and seconds on page information on the bottom is something we’ll touch on later in this blog post.
A DoS bot presents a medium level of risk to your site. If a DoS bot is successful, it can present users on your site from filling out a contact form to download a resource, talk about a project, or buy one of your products. The bottom line is that these bots can make you lose valuable leads who are interested in your services and ultimately, lose revenue.
This last motivation is, by far, the most worrisome. Where Message-oriented bots don’t typically care about targeting you, and DoS bots are targeting you in the most superficial fashion, reconnaissance (or recon for short) bots are trying to get information specifically about you. These bots tend to fall into two groups: phishing, and exfiltration. Phishing Recon bots are aiming to get internal staff to interact with one of their submissions. Doing so may offer future avenues for social engineering attacks (allowing an attacker to impersonate an authority figure whom staff might trust). Exfiltration Recon bots are instead focused on retrieving specific information from your system. These bots are hoping to find out more about your infrastructure (for example, to probe for vulnerabilities that they could exploit in the future).
You’re likely familiar with phishing email scams, as seen in the example below, which happened recently to a few members of our team. The person who sent the email is trying to gain access to our team’s phone numbers by impersonating our CEO. Similar tactics are used in recon spam bots in web forms.
This kind of bot is so much scarier than the others because it typically operates as a prelude to a much more targeted attack, which will be harder to counter. These bots are trying to maximize the revelation of information; they are the most variable form of bot: they could make many submissions to determine how your system reacts under certain circumstances, or they could submit infrequently to appear as close to legitimate as possible (to increase the likelihood of staff interaction).
How do I stop spam bots on my website?
Thankfully, there are a wide array of strategies to combat all varieties of spam bot. We’ve had great success implementing combinations of the following strategies on our own website and for our clients. Note that every client’s needs and threat models are different, so the correct combination and implementation will vary. Below is a very brief overview of each strategy and its tradeoffs.
(higher is better)
(higher is better)
(lower is better)
|Honeypot Fields||Very High||High||Very Low|
|Disallowing Autofill||Low||Very Low||Very High|
|(re/No)Captcha, hCaptcha||Very Low||Very High||Very High|
1. Honeypot Fields
A honeypot is a special field added to your form, invisible to normal users. We typically implement these through a hidden text box (for accessibility reasons, usually we label them something like “do not fill out this field”). All the types of bots above are likely to fall for this trap: Message-oriented bots are likely to fill out as many fields as possible to repeat their message as many times as possible; DoS bots don’t want to miss a chance to send more data; and, even though they want to look legitimate, the difficulty of knowing when a field is required means Recon bots will often fill these invisible fields too. This means it is surprisingly effective, despite its simplicity. For our clients, it frequently handles 95% or more of all spam submitted.
This is almost always the fastest strategy to implement (and poses essentially no down-side); this is our very first plan of attack to combat spam on any given web form.
2. Speed Limits
If a simple honeypot field doesn’t handle all the spam a client is getting, our next step is a speed limit. They take a bit longer to implement correctly, and they have a risk (if implemented without care) of filtering out legitimate traffic. In short, a speed limit imposes a minimum required time spent on a page before the form is submitted. We tend to implement this as a special cookie or a hidden field that lists the timestamp of when the page was loaded. If the difference between the submission’s timestamp and the timestamp of the page-load is less than the speed limit’s set minimum, we can flag that submission as likely not being legitimate.
The basic setup of this strategy is just as quick to implement as a honeypot, but you need to spend some extra time thinking about the reasonable minimum time-spend (shorter forms will take less time to fill out than longer ones). And, some forms can be auto-filled by web-browsers, which you need to account for in your implementation. Once you have spent some time thinking about how fast a user could reasonably submit a form, you can choose your minimum (e.g., 3 seconds). As in the screenshot above demonstrating an example of DoS spam, we frequently will setup the basic infrastructure for a speed limit (without enforcing any limit). This allows us to gather some information about how long typical legitimate and spam submissions take, and it gives us a head start on determining a reasonable minimum.
Like honeypot fields, speed limits are nice because they can usually be implemented quickly, have minimal negative side effects and be surprisingly effective. Because bots can fill out form fields much faster than humans, even very low speed limits (which are unlikely to flag legitimate traffic) frequently filter out a significant amount of spam.
3. Disabling Autofill
Not to mention, autofill is a massive accessibility benefit, and generally improves users’ quality of life. Think about it—how many times in a day do you use form autofill to remember different addresses, email addresses, etc? You likely would feel frustrated if that feature were taken away too.
In short, though easy to implement, it usually comes with a lot of negatives, and very few positives. It is an option, but we typically recommend our clients to steer clear.
Captchas are one of the oldest forms of cutting down on bot spam, and they can still be very effective today. A captcha is as a riddle that a user must solve for a submission to be considered legitimate. A very simple version of a captcha would be generating two small, random numbers and asking the user to give the sum of the numbers in one of the form fields. Perhaps surprisingly, even the simplest forms of a captcha are effective (most bots aren’t actually smart enough to solve these kinds of things). However, if an attack is more targeted to your site, even rather complex captchas may be ineffectual.
Because most bots that would be stopped by simple captchas tend to be filtered out by honeypot fields and speed limits, and because building complex captchas is a far more significant effort, we usually avoid this strategy as well, with one major exception…
5. reCaptcha, NoCaptcha, and hCaptcha
reCaptcha is a captcha offered by Google. If you have any nostalgia (or deep-seated hatred) for the following image, you are very familiar with reCaptcha:
NoCaptcha (also known as reCaptcha v3) is Google’s more recent offering, something you have likely seen popping up everywhere. It starts as a simple checkbox (that represents a lot of checks happening in the background). If any of those background checks fail, a more significant challenge (often classifying a set of images) is required.
One newcomer to this space is hCaptcha. hCaptcha works very similarly to Google’s NoCaptcha though it aims to be more privacy-respecting than Google’s offerings.
All three of these options allow you to use very complex captchas without having to design them yourself. There are three negatives to be aware of:
- Using a third party means if their servers are down, your forms might not work
- Specific to reCaptcha and NoCaptcha, Google makes money by harvesting user-data which is a privacy concern (and possibly a legal concern given the General Data Protection Regulation and California Consumer Privacy Act)
These are, in some sense, the nuclear option. Once implemented, they are using invasive checks to filter out spam very effectively; but, you dramatically increase friction for users. These are an option to keep open (especially in the case of relentless, ongoing spam), but we tend to use them only as a last resort.
To recap what we’ve discussed:
There are three types of spam bots that typically target your website:
- Denial of Service (DoS)
There are a wide variety of methods we can use to stop web form spam on our client’s sites:
- Honeypot fields
- Speed limits
- Disabling autofill
- reCaptcha, NoCaptcha and hCaptcha
In the end, there is no perfect, one-size-fits-all solution to stopping web form spam on your site, but there is no shortage of options to leverage. For each website and form, you can find a combination of strategies that will get spam under control! As spam bots continue to evolve, it is important to stay ahead of the curve and implement proactive strategies that help to cut down on spam entering your site and interfering with your operations.
Looking to improve your online presence and optimize your site for a great customer experience? Check out these 8 quick wins to improve your website and increase traffic!
To enable comments sign up for a Disqus account and enter your Disqus shortname in the Articulate node settings.