Posts Tagged ‘spam’

Anandtech uses Trackback spam on my blog to increase traffic?

Monday, February 20th, 2012

I’ve been dealing with a rather persistent comment spam issue with my blog. Akismet seems to ignore this particular type of trackback spam and has let hundreds slip through over the last few months. Since my blog really isn’t that busy, it is very easy to identify the spam. I can’t turn off trackbacks as I have gotten trackbacks from other people referencing my site showing how they solved a problem.

A couple weeks ago I installed Simple Trackback Validation with Topsy Blocker which has done a good job of catching the ones that Akismet seems to have problems detecting. As I run a rather complex setup for testing my plugins, I submitted a bug fix which the author promptly installed. I don’t know why Akismet doesn’t detect keyword keyword… as potential spam, but, almost every trackback Akismet has missed, follows that pattern and so far, the other plugin has caught every single one.

This morning, I noticed two comments had been posted in the last seven hours, and, when I looked, I saw:

While my site isn’t that busy, it ranks fairly well on some search phrases and social circles. The most popular post on my blog was written over two years ago and was for Mac OS/X Snow Leopard. Other posts have been more popular over a thirty day period, but, that post has stood the test of time even though it is now two generations of Operating Systems behind.

But, I do have sites that do trackback spam from that particular page quite frequently, and, Akismet misses them about 85% of the time. You wouldn’t know from the Akismet graphs, it claims a much higher success rate, almost the inverse of the fail rate.

I know when I get links from, I do often check them to make sure they are linking, and, in a few spot checks in the past, they were, but, those links appear to have been cleaned from their forums and the two posts where I remember them being listed. Today however, the trackback was on a very new article which had no relation to the page it linked to, and obviously, no link from their site pointed to my site.

I looked back through the approved comments and found 71 other trackbacks, investigated a number of pages, and, as you might suspect, my link wasn’t present anywhere.

This is where the analysis turns a bit sinister. Why did they pick a page that detailed an issue with Varnish and gzip compressed pages to link to an article Anandtech wrote yesterday? Age of the post? The original post was from Dec 2009, though, it is the outlier in the stats.

Upon looking at thirty of the trackbacks, a curious pattern emerged. Since adding the social media buttons for Google+, Twitter and Facebook, I’ve had a quick metric to gauge post popularity, and, lo and behold, Anandtech is targeting posts that have high tweet counts with the exception of the original outlier.

The original post is linked to a post that deals with Social Game Design which is also a popular post. There may have been a trackback on that page which I deleted ages ago and their trackback bot just spidered it.

Or, it could be completely dumb and just taken the urls from if they looked far enough back through my history – except for the original outlier.

But Anandtech, Welcome to the Comment Blacklist.

vBulletin spam signup email addresses, combatting the problem

Thursday, March 3rd, 2011

A while back I had corresponded with Google’s spam team regarding a pattern I had discovered and sent it off to some people. It appears that they used some of that to clean up the search results removing this particular type of spam, but, the source of the problem still exists. Over the past 60 days, a particular client’s vBulletin site has received 2670 signups, over half using gmail addresses. A group of three people have independently looked at every signup to verify that these indeed do fit the spam pattern.

It appears an outsourcing company is hired to sign up, but, are given a list of email addresses that they can choose from. Signup and verification always take place from radically different IPs, so, we can assume that the people doing the actual signup have no idea that their verification email never goes out. This is confirmed by the fact that they use multiple periods in their gmail address to make the email address appear to be unique. Once we’ve determined that the email address has already been seen by modifying vBulletin to strip out the . and truncate at the +, it is instantly banned. We opted to allow the signups to be registered rather than saying that the email was in use.

A slight background to the issue. Google allows one to use a . or + in the email address which resolves to the same destination address. While I like that feature and have used it in the past, vBulletin appears to ignore this fact. So, the email address goes to the same destination as and Likewise, you can use the + in the email address to signify the source of the email. So, might signify that the email came from your twitter profile and comes from your facebook profile. Since they end up at the same place, google is a perfect way to have hundreds of email addresses that appear to be unique, but, are delivered to the same destination. This means your validation script can check fewer mailboxes, decode the validation email looking for the link, and can automatically click.

Initially our client installed Recaptcha which increases the chance that a human is probably filling out the form. Based on the number of resubmissions, I’m reasonably certain that a human is doing the data entry and they aren’t cracking Recaptcha.

I figured at one point these were created accounts, but, some of the names are so specific, one would have to assume that perhaps there are some compromised email accounts in here as well. If you glance through the list, you’ll see judicious use of the . and + to try to create unique email addresses.

The first thing we did was write a plugin that hooked into the signup process that cleaned up the email addresses. The second thing we did was look for a signup that took place in a country different from the verification click. Often times they did use proxy servers, so, using a few of the proxy dns blacklists, we were also able to make an educated guess that the signups were probably going to post spam. The first post at the board is moderated using Akismet for any that slip through, but, this method appears to be fairly good at hitting the right ones, and out of 2691 signups, it detected 2670 spam signups with 1 false positive. The false positive was a tough one – even looking at the signup data, the IP addresses used in both the signup request and validation took place in separate countries according to maxmind’s geoip database (the person signed up at work, drove home across a country border in Europe, and validated his email address from home). We also changed the registration form and put a second link above the first that said:

If you didn’t intend to sign up, click this link

For a few days, their spider was hitting the first link, banning the account for us. Often times there was a delay of a few days between an account that was validated and the first post.

If you look at the list, you can see where they have attempted to obfuscate the email address, and in some cases, are using the + to insert a counter. Based on the posts that were made, it suggests we might have more than one group actually spamming, all outsourcing the account creation to the same company.

Spammers are resourceful. It is a shame there isn’t a way to get these email addresses shut down to squelch some of the spam at the source.

Since starting this post, eight more signups came in, bringing the average to roughly 90 signups per day.

In short:

* Check a ‘cleaned’ email against the database, i.e. remove the . and truncate at the + for gmail/googlemail accounts
* Use Recaptcha
* Alter the signup form to include a link to decline the signup
* Look at the Signup IP and the validation IP
* Validate Signup IP isn’t coming from a proxy

Hiding Data in Plain Sight

Wednesday, March 4th, 2009

I had a breakfast meeting today with a company involved in forensic reconstruction of data after a possible crime had been committed.  Somehow the conversation shifted slightly and we talked about the process and one of the people said, “You know, it wouldn’t be so bad if we didn’t have to wade through all that spam and not find anything worthwhile in the email messages that showed how the person communicated.”

At this point I said, have you ever thought that they could be using Spam Steganography?  Eyebrows were raised, the conversation paused and I was met with a blank stare for about 30 seconds.

The assumption is that encrypted data needs to look like encrypted data or a string of numbers and letters that are unintelligable.  While this system doesn’t really produce well hidden data, the premise is valid.

Dear Friend ; Thank-you for your interest in our publication
. If you no longer wish to receive our publications
simply reply with a Subject: of “REMOVE” and you will
immediately be removed from our club ! This mail is
being sent in compliance with Senate bill 1816 ; Title
3 ; Section 304 . This is not multi-level marketing
. Why work for somebody else when you can become rich
within 45 days . Have you ever noticed more people
than ever are surfing the web & people love convenience
! Well, now is your chance to capitalize on this .
We will help you SELL MORE and use credit cards on
your website . You are guaranteed to succeed because
we take all the risk ! But don’t believe us ! Ms Ames
of Montana tried us and says “I was skeptical but it
worked for me” ! We are licensed to operate in all
states . We implore you – act now . Sign up a friend
and you’ll get a discount of 80% ! Best regards . Dear
E-Commerce professional , Especially for you – this
breath-taking news . We will comply with all removal
requests . This mail is being sent in compliance with
Senate bill 1626 ; Title 1 ; Section 301 . This is
different than anything else you’ve seen ! Why work
for somebody else when you can become rich in 38 weeks
. Have you ever noticed most everyone has a cellphone
plus people love convenience ! Well, now is your chance
to capitalize on this . We will help you decrease perceived
waiting time by 200% plus use credit cards on your
website ! You are guaranteed to succeed because we
take all the risk . But don’t believe us . Mr Jones
of Georgia tried us and says “Now I’m rich many more
things are possible” ! This offer is 100% legal ! So
make yourself rich now by ordering immediately ! Sign
up a friend and you’ll get a discount of 60% . Best
regards !

The above message decodes to: This is a test message

I now wonder if they will be redoing that investigation looking for stegonagraphically encoded spam.

Entries (RSS) and Comments (RSS).
Cluster host: li