Aug. 20th, 2003

mmcirvin: (Default)
This thing purports to tell if text was written by a boy or a girl. It has been applied to all the hip kids' blogs.

Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying [livejournal.com profile] samantha2074 as female, for some reason.

Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.

Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.
mmcirvin: (Default)
zxcv,hkljwef,m.;asldkfqweopri'zxczl;cxvas

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 1st, 2025 12:26 pm
Powered by Dreamwidth Studios