Kid is an Adult.zip

Just got a pri­vate mes­sage on Last.fm from intu­itionor­phan, sub­ject “I hope this makes sense to you,” body “I have an earnest desire to change the world,” with a Medi­aFire link to a file named “Kid is an Adult.zip.” I’m cer­tain the file con­tains some­thing bad, but I’m curi­ous about which or what kind of bad thing it is. None of those phras­es turns up any rel­e­vant results on Google.

Happy Birthday, Forum SpamBots!

In deal­ing with spam­bots as the admin­is­tra­tor of a cou­ple phpBB forums, I’ve noticed that most of them, when reg­is­ter­ing, give the birth­date of March 28, 1983. I thought there might be some expla­na­tion for this con­sis­ten­cy — like the Jan­u­ary 1, 1970 dat­ing of phan­tom phpBB posts — but a Google search for “march 28 1983” only turns up threads on var­i­ous forums men­tion­ing the coin­ci­dence, with­out offer­ing any expla­na­tion.

I did find out that these spam­bots are like­ly the prod­uct of XRumer, a spam­ming tool, so my guess is that March 28, 1983 is the default val­ue of a con­fig­urable birth­date, but I’m not real­ly inter­est­ed in installing it to find out.

Data Lust

I love Mozil­la Thun­der­bird, not least of all because it’s a Mozil­la-brand­ed prod­uct, but also large­ly because of its adap­tive junk mail fil­ter. What this means is that for every email you get, you’re able to mark it as “junk” or as “not junk,” and from both of these prac­tices, Thun­der­bird begins to learn (through Bayesian fil­ter­ing) how to iden­ti­fy spam.

If you’re any­thing like me you’ve noticed that spam­mers are get­ting a lot crafti­er in recent months; I’ve even had a few spam emails slip into my Gmail inbox, when Gmail has in my expe­ri­ence been noth­ing short of astound­ing in its abil­i­ty to iden­ti­fy spam. Which is to say, Thun­der­bird isn’t catch­ing every­thing for me, at least not yet. I mark every spam I get as such, but the fil­ter­ing relies on your mark­ing the non-spam as well.

Any­way, it’s not hard work to mark all these emails (espe­cial­ly if you can high­light a bunch from a num­ber of trust­ed senders and mark “not spam”), but it’s still work, and I’d hate to see it all go to waste if my hard dri­ve crashed, or even if Thun­der­bird’s devel­op­ment sud­den­ly halt­ed — the data could prove use­ful else­where. And the idea of even hav­ing that data acces­si­ble to me out­side of a prac­ti­cal imple­men­ta­tion with­in a sin­gle pro­gram — in raw, brows­able form — is real­ly, real­ly appeal­ing.

Through very lit­tle Googling I found out that Thun­der­bird keeps all this train­ing data in a sin­gle file, named, apt­ly, training.dat. It’s in your “Doc­u­ments and Set­tings\Jay\Appli­ca­tion Data\Thunderbird\Profiles\2e8vm8m0.default” fold­er. And appar­ent­ly, sim­ply putting it in anoth­er pro­file fold­er migrates all the train­ing you’ve done to that oth­er pro­file. Amaz­ing­ly sim­ple.

Here’s what the first ten lines of mine look like:

þíúÎ
jus­ti­fies,
mean­ing­ful
sub­li­cense
pro­pelling direct
fly­er-ing,
herbalis­er­att
aggres­sion
(sur­prise,
inflat­able

I don’t get it either, and it just goes on like that, with no imme­di­ate­ly rec­og­niz­able struc­ture or indi­ca­tion of what sig­nif­i­cance these words have, save for some seem­ing­ly ran­dom para­graph breaks.

BUT, when I Googled what I now knew to be the file­name of the train­ing data, I found that Mozil­la cre­at­ed a lit­tle Java pro­gram called the Bayes Junk Tool, which makes this data sur­pris­ing­ly leg­i­ble, AND exportable as XML, AND allows you to edit this data arbi­trar­i­ly!! I could­n’t have asked for more.

Truth­ful­ly, I’m a lit­tle dis­ap­point­ed in the rel­a­tive­ly rudi­men­ta­ry Bayesian approach. I thought for sure this training.dat file would be rid­dled with reg­u­lar expres­sions, teach­ing Thun­der­bird that “v1agar” is the same thing as “\/|a gra.” Although that’s prob­a­bly too sub­tle even for reg­u­lar expres­sions. I can dream can’t I.

None of this is to under­cut the invalu­a­bil­i­ty of MozBack­up, which keeps set­tings, cook­ies, exten­sions, cached files, and more with­in a sin­gle back­up file.