This doesn't work
Aug. 20th, 2003 12:53 amThis thing purports to tell if text was written by a boy or a girl. It has been applied to all the hip kids' blogs.
Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying
samantha2074 as female, for some reason.
Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.
Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.
Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.
Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.