weather | : | sunny/cloudy | |
outside | : | 14°C | |
mood | : | ![]() | amused |
I give hutta's coding ability the benefit of the doubt here. Let's say he didn't screw up something major, like put a '>' where it was supposed to be a '<' or vice versa (which it almost looks like he has; if he'd just reverse the last comparison operator in his code and call everyone "female", then he'd be statistically more accurate than not) =)
Gender Genie's interpretation of the Koppel-Argamon paper is also an unknown. It does say that they use "a simplified version of an algorithm developed by Moshe Koppel, Bar-Ilan University in Israel, and Shlomo Argamon, Illinois Institute of Technology..."
There's also the soundness of the paper itself that's in question. However widely accepted research is, the conclusions drawn from their test cases can be too general, too specific or dead wrong. Natural Language Processing is a veryvery difficult thing to get right. I haven't looked at the paper yet, I just printed it out, so my ass and my elbow could be looking a bit similar right now =)
I've found that informal writing is also very prone to throwing off these kinds of algorithms. For example, in a blog, both men and women usually tend to abbreviate, especially with numbers. Well, if you're thinking that men use numbers and women spell them out, everyone will look male to you. As well, blogs and journals usually have datestamps and timestamps on them, which are usually numbers. Gender Genie claims to have a different algorithm that takes care of this case, but they're still trying to predict humans, so I still have my doubts.
Nonetheless, props to hutta for the implementation of an intriguing concept =)