Finding Bad Spam Delights Geeks
(Wired News) When freelance Web developer Joe Stump first installed the e-mail filtering program SpamAssassin, he and a friend started a competition. Each day, the two would look through their junk e-mail and try to find the missive that SpamAssassin had assigned the highest score.
“It was always a little contest between the two of us,” says Stump. “We were always trying to tweak and modify the settings to get it just right. I finally won the contest when I got a spam with a score of 43.”
The popularity of SpamAssassin for filtering out unwanted mail has given birth to a new pastime for many of its users — poring over their deleted mail to find the most egregious spam, at least as ranked by the software.
“It seems to be almost universal,” says Mark Pilgrim, a Web developer and professional trainer in Apex, North Carolina, and the author of a popular weblog about programming. “Everyone seems to be, ‘Ooh, look! Points!'”
SpamAssassin filters out junk mail based on a points system. The program, which resides on the company’s servers, assigns values to various properties of an e-mail, from use of the phrase “call now” to whether the mail header is properly formed. A higher score means the e-mail is more likely to be spam.
On the client side, users can set the threshold of values at which they will accept e-mails. Most users set their spam threshold at about 5.
By comparison, the value of most spam is much higher, about 30 or 40.
There have been online contests looking for the “best” spam. A Slashdot poll about SpamAssassin settings has logged more than 17,000 votes, and more than 400 comments, including comparisons of high-scoring spams. Dozens of bloggers have posted about their scores.
But what is the cause of this fascination?
“I think part of it is that spammers are almost universally stupid,” says Pilgrim. “They’re doing all of these things to avoid detection, but it doesn’t work anymore. All of them thought they were very clever, but it’s exactly those things they were doing that the second-generation tools like SpamAssassin catches.”
There’s certainly some sick pleasure in finding an astronomically high-scoring spam. Some of the messages seem almost perfectly crafted to trigger a spam filter.
“I got one message that was around 70 points,” says Pilgrim. “You wonder if these messages could work on anyone if they’re that bad.”
Stump’s friendly competition over SpamAssassin scores led him to write a script to cache his scores automatically. After that, he developed a database that he’s taken public, called SpamChart. The site is currently compiling statistics from 15 ISPs, with plans to add many more. And while tracking SpamAssassin scores might have been the genesis of the project, Stump is finding out much more about junk e-mail traffic.
“We always hear these numbers that 50 percent of all e-mail is spam,” says Stump, “and through our tracking, we’re finding that it’s at least that.” Since the middle of June, sites reporting to Stump have caught almost 100,000 spam messages, and he plans to expand the tracking tool into a stand-alone application that then would report to a central database.
The highest-scoring spam that has come through Stump’s system was an astronomical 131.20. “How stupid do you have to be to send a spam that scores that high?” wonders Stump. “You have to wonder why these spammers don’t follow the technology better.”
It’s a near certainty that spammers will catch up with SpamAssassin soon, necessitating further refinement of the program’s rules. But until that happens, Stump is sure that the fascination with SpamAssassin scores will continue.
“Geeks tend to have twisted status symbols,” he says. “Having a high score of 130 is like that. Geeks love numbers. If you give us numbers, we’re going to like it.”