I just came across Project Honey Pot, an effort to distribute honey pot generators around the Web to discourage spammers from automatically harvesting e-mail addresses. The honey pot generators produce a page with random content and specially generated fake e-mail addresses that are traceable to the IP address of the harvesting party. Once the e-mail harvester has been located, they can be subject to legal action depending on the site's jurisdiction.
Project Honey Pot is managing the e-mail servers that will receive and process mail sent to the honey pot addresses. In order to prevent them all from having the same, easily recognizable domain, they are asking people to donate MX records for their own subdomains. The honey pot e-mail addresses then randomly choose from among the pool of honey pot domains, which they hope will become large enough for spammers not to be able to distinguish from real e-mail domains.
It's an interesting idea. I think i will probably sign up. However, i wonder about a few things:
- The example page they show contains a long legal warning, presumably to establish the power to take legal recourse upon e-mail harvesters. Is this warning necessary? Isn't it a pretty obvious way for a harvester to detect which pages are honeypots and ignore them safely?
- The vast majority of people donating MX records are likely to donate subdomains (honeypot.mydomain.com), not entire second-level domains (mydomain.com). Couldn't a harvester get away just by skipping e-mail addresses with third-level domains? Collecting e-mail addresses with second-level domains would still yield a great many addresses they can use to spam people.
- Presumably the script they provide to generate honey pots either contains the list of honey pot mail domains, or retrieves the list somehow from the central project site. In either case, can't a spammer just get a copy of the script by signing up as a user of Project Honeypot, acquire the list, and filter out these honey pot addresses from their lists of harvested addresses?
There are lots of techniques out there that people use to obscure their e-mail addresses from spammers. Sometimes people spell out the at-sign and the dot (e.g. username at domain dot com) hoping that spammers won't look for that pattern. Sometimes people replace the at-sign, the dot, or letters in the e-mail address with HTML character entities (email@example.com), hoping that spammers won't decode them. I'm not so hot on these tactics, because they're trivially defeated by a small improvement to automatic harvesting programs, and it's entirely likely that many spam harvesting tools account for these patterns already.
A more effective strategy, though an ugly one, is to mangle addresses and include instructions for de-mangling them, or make the mangling obvious enough that any human would figure it out (e.g. usernameREMOVETHIS@domain.com), as long as the same type of mangling doesn't get used so widely that it can be automatically detected and fixed. The method i use on my website replaces the at-sign and dot with an image of an at-sign and an image of a dot, which i hope only to be detectable by OCR software, and i figure it will be a long time before spammers find it practical to run OCR algorithms on every small image they encounter.
Here's a new idea that just occured to me. Instead of announcing your e-mail address on every page, link to a contact page. Your contact page doesn't reveal your e-mail address; instead, it asks the reader to enter their e-mail address into a form. Then your site sends them your e-mail address.
If everyone started doing this, then spammers could start to automate the process of filling out these forms. To prevent that from working, the sites that use this technique could submit the requesters' addresses to a central blacklist for checking. Requesting addresses at known spammer domains would be rejected. To take care of addresses at free e-mail domains like Hotmail, the blacklist would keep track of how many requests came from the same address. Too many requests in a short period of time would blacklist the address, and then everyone's forms would stop responding to that address. Spammers would be forced to resort to constantly registering new domains or constantly creating new accounts on free e-mail services, both of which are fairly expensive and time-consuming procedures.
(Spammers do have to keep moving around now, but currently they only have to move after sending each batch of spam. The scheme i'm proposing forces them to move once for every few addresses they collect, which i think would be much more expensive for them and obstruct them before they can send hardly any spam.)
What do you think?