Aug. 20th, 2003

mmcirvin: (Default)
This thing purports to tell if text was written by a boy or a girl. It has been applied to all the hip kids' blogs.

Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying [livejournal.com profile] samantha2074 as female, for some reason.

Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.

Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.
mmcirvin: (Default)
zxcv,hkljwef,m.;asldkfqweopri'zxczl;cxvas

May 2025

S M T W T F S
    123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 7th, 2025 01:30 am
Powered by Dreamwidth Studios