This doesn't work
Aug. 20th, 2003 12:53 amThis thing purports to tell if text was written by a boy or a girl. It has been applied to all the hip kids' blogs.
Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying
samantha2074 as female, for some reason.
Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.
Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.
Now, the authors of the original paper would probably object that it was calibrated with fiction, not blogs and Usenet posts. But given that disclaimer, based on the tallied results that pop up when you submit your evaluation of its guess, I think we can safely say that, at least with regard to Web text samples, it is doing no better than chance. In my own tests, it identified Andy as female and Claudia as male, and the results on my own writing are split about fifty-fifty, with one Usenet post that it actually identified as androgyne. However, it is really good at correctly identifying
Pasting in a bunch of blog entries with time stamps probably gives it a female bias, since it seems to regard numbers as feminine for some insane reason.
Further thought: You know, it might be interesting to try to train a Bayesian spam-blocker algorithm to tell males from females. It would probably do better.
no subject
Date: 2003-08-20 01:03 pm (UTC)no subject
Date: 2003-08-21 01:29 am (UTC)No it doesn't. Are you sure you're reading it right? The male symbol has the suggestive pointy arrow, and the female one has an upside-down cross, for obvious reasons. Who said that? I've checked your results a bunch of times, and I'm now pleased to announce that Claudia and I are the genders I thought we were prior to this morning, to 87% certainty.
no subject
Date: 2003-08-21 01:35 am (UTC)no subject
Date: 2003-08-20 05:01 pm (UTC)no subject
Date: 2003-08-21 02:44 pm (UTC)I was vaguely interested to know that it thought my troll of wstd.general was pretty feminine.
I was also vaguely interested to note that using 'male' words increases your score and using 'female' words decreases it. Ah, well ...
no subject
Date: 2003-08-22 12:56 am (UTC)no subject
Date: 2003-08-21 03:25 pm (UTC)Pfft.
no subject
Date: 2003-08-22 01:21 am (UTC)Your original message: MALE
My followup: FEMALE
grumblepants's followup: FEMALE
astrange's followup: FEMALE
sunburn's followup: MALE
no subject
Date: 2003-08-22 10:17 pm (UTC)no subject
Date: 2003-08-23 04:31 pm (UTC)(By the way, I always wondered what was up with Book Blog, because I could never see the actual blog content. It turns out that you can see it in IE and Safari, but not in anything Gecko-based. Most likely there's some coding irregularity on the site-- I wonder what it is. The W3C validators choke on it pretty fast.)