Just got a private message on Last.fm from intuitionorphan, subject “I hope this makes sense to you,” body “I have an earnest desire to change the world,” with a MediaFire link to a file named “Kid is an Adult.zip.” I’m certain the file contains something bad, but I’m curious about which or what kind of bad thing it is. None of those phrases turns up any relevant results on Google.
In dealing with spambots as the administrator of a couple phpBB forums, I’ve noticed that most of them, when registering, give the birthdate of March 28, 1983. I thought there might be some explanation for this consistency — like the January 1, 1970 dating of phantom phpBB posts — but a Google search for “march 28 1983” only turns up threads on various forums mentioning the coincidence, without offering any explanation.
I did find out that these spambots are likely the product of XRumer, a spamming tool, so my guess is that March 28, 1983 is the default value of a configurable birthdate, but I’m not really interested in installing it to find out.
I love Mozilla Thunderbird, not least of all because it’s a Mozilla-branded product, but also largely because of its adaptive junk mail filter. What this means is that for every email you get, you’re able to mark it as “junk” or as “not junk,” and from both of these practices, Thunderbird begins to learn (through Bayesian filtering) how to identify spam.
If you’re anything like me you’ve noticed that spammers are getting a lot craftier in recent months; I’ve even had a few spam emails slip into my Gmail inbox, when Gmail has in my experience been nothing short of astounding in its ability to identify spam. Which is to say, Thunderbird isn’t catching everything for me, at least not yet. I mark every spam I get as such, but the filtering relies on your marking the non-spam as well.
Anyway, it’s not hard work to mark all these emails (especially if you can highlight a bunch from a number of trusted senders and mark “not spam”), but it’s still work, and I’d hate to see it all go to waste if my hard drive crashed, or even if Thunderbird’s development suddenly halted — the data could prove useful elsewhere. And the idea of even having that data accessible to me outside of a practical implementation within a single program — in raw, browsable form — is really, really appealing.
Through very little Googling I found out that Thunderbird keeps all this training data in a single file, named, aptly, training.dat. It’s in your “Documents and Settings\
Jay\Application Data\Thunderbird\Profiles\ 2e8vm8m0.default” folder. And apparently, simply putting it in another profile folder migrates all the training you’ve done to that other profile. Amazingly simple.
Here’s what the first ten lines of mine look like:
I don’t get it either, and it just goes on like that, with no immediately recognizable structure or indication of what significance these words have, save for some seemingly random paragraph breaks.
BUT, when I Googled what I now knew to be the filename of the training data, I found that Mozilla created a little Java program called the Bayes Junk Tool, which makes this data surprisingly legible, AND exportable as XML, AND allows you to edit this data arbitrarily!! I couldn’t have asked for more.
Truthfully, I’m a little disappointed in the relatively rudimentary Bayesian approach. I thought for sure this training.dat file would be riddled with regular expressions, teaching Thunderbird that “v1agar” is the same thing as “\/|a gra.” Although that’s probably too subtle even for regular expressions. I can dream can’t I.
None of this is to undercut the invaluability of MozBackup, which keeps settings, cookies, extensions, cached files, and more within a single backup file.