Month: February 2007

Data Lust

I love Mozilla Thunderbird, not least of all because it’s a Mozilla-branded product, but also largely because of its adaptive junk mail filter. What this means is that for every email you get, you’re able to mark it as “junk” or as “not junk,” and from both of these practices, Thunderbird begins to learn (through Bayesian filtering) how to identify spam.

If you’re anything like me you’ve noticed that spammers are getting a lot craftier in recent months; I’ve even had a few spam emails slip into my Gmail inbox, when Gmail has in my experience been nothing short of astounding in its ability to identify spam. Which is to say, Thunderbird isn’t catching everything for me, at least not yet. I mark every spam I get as such, but the filtering relies on your marking the non-spam as well.

Anyway, it’s not hard work to mark all these emails (especially if you can highlight a bunch from a number of trusted senders and mark “not spam”), but it’s still work, and I’d hate to see it all go to waste if my hard drive crashed, or even if Thunderbird’s development suddenly halted — the data could prove useful elsewhere. And the idea of even having that data accessible to me outside of a practical implementation within a single program — in raw, browsable form — is really, really appealing.

Through very little Googling I found out that Thunderbird keeps all this training data in a single file, named, aptly, training.dat. It’s in your “Documents and Settings\Jay\Application Data\Thunderbird\Profiles\2e8vm8m0.default” folder. And apparently, simply putting it in another profile folder migrates all the training you’ve done to that other profile. Amazingly simple.

Here’s what the first ten lines of mine look like:

þíúÎ
justifies,
meaningful
sublicense
propelling direct
flyer-ing,
herbaliseratt
aggression
(surprise,
inflatable

I don’t get it either, and it just goes on like that, with no immediately recognizable structure or indication of what significance these words have, save for some seemingly random paragraph breaks.

BUT, when I Googled what I now knew to be the filename of the training data, I found that Mozilla created a little Java program called the Bayes Junk Tool, which makes this data surprisingly legible, AND exportable as XML, AND allows you to edit this data arbitrarily!! I couldn’t have asked for more.

Truthfully, I’m a little disappointed in the relatively rudimentary Bayesian approach. I thought for sure this training.dat file would be riddled with regular expressions, teaching Thunderbird that “v1agar” is the same thing as “\/|a gra.” Although that’s probably too subtle even for regular expressions. I can dream can’t I.

None of this is to undercut the invaluability of MozBackup, which keeps settings, cookies, extensions, cached files, and more within a single backup file.

3 Responses

Rubies” Illustration

Destroyer - "Rubies"

Okay so the first one could be “Quiet, Ruby, someone’s coming…oh it’s just your precious American underground,” though a subway is kind of a goofy way of depicting that. The second one is probably the “cheap dancers.” Third is “Blessed doctor, cut me open.” Then there’s “Proud Mary said as she lit the fuse,” though I’m not sure what a fire hydrant has to do with anything. And finally of course “Priest says, ‘…I can’t bear her raven tresses caught up in a breeze like that.'”

Via Streethawk LiveJournal Community.

Leave a Comment

Ten Jars

Last Supper OrbIn a position to invent my own responsibilities, and to realign them on whims, I don’t get very far. Even if I remain productive, it’s a fractured, diffuse, directionless kind of productivity, composed of many tiny islands, sealed in vacuums, free of context or import. Meanwhile, everything that I’m not doing screams with an urgency that what I am doing can never match. Being fucked with by the sparkly allure of things in my periphery, even the most worthwhile sparkly things, undermines all the effort.

Of course there’s always something arbitrary to how you choose to spend your time. Which is probably what a lot of people mean when they say they work well under pressure — it’s not the threat of a deadline that fosters productivity, it’s the conviction with which you act, knowing fully that this is exactly the correct investment of your time and energy. For the moment, you’re free of that responsibility, of choosing what to do, left only with the busy-work of doing it. And that’s a huge relief.

And the problem with having goals as ill-defined as mine is that there are no looming deadlines, only a vague understanding that this is going somewhere, eventually, assuming out of necessity that none of it is in vain. And worse, the gratification, the payoff, is not just delayed; that would make things a lot easier. It’s more than delayed, it’s practically invisible, the result of infinitesimal accumulations that never accelerate or burst with finality, but just collect like sediment, like that big jar of sub-quarter coins. And nobody would ever dream of working for that jar, much less ten jars. When you’re emptying your pockets at the end of the day, which jar do you choose?

Leave a Comment