The Bride of the First House (bride) wrote,
The Bride of the First House
bride

Language Processing for Gender Identification

weather: sunny/cloudy
outside: 14°C
mood: amused
Bookblog's Gender Genie algorithm, as implemented by hutta, pegs me as male. Ha, you'd think with a name like "bride", there would be no mistake =) But from the looks of it, it's fairly inaccurate, especially with women. You'll also notice that statistically, the number it got right vs. wrong (35%/58%) looks almost the same as the male/female split (36%/63%) from the LJ Statistics page. This means that his tool could just be saying that everyone is male and it's correct only insofar as it's mirroring the population. =)

I give hutta's coding ability the benefit of the doubt here. Let's say he didn't screw up something major, like put a '>' where it was supposed to be a '<' or vice versa (which it almost looks like he has; if he'd just reverse the last comparison operator in his code and call everyone "female", then he'd be statistically more accurate than not) =)

Gender Genie's interpretation of the Koppel-Argamon paper is also an unknown. It does say that they use "a simplified version of an algorithm developed by Moshe Koppel, Bar-Ilan University in Israel, and Shlomo Argamon, Illinois Institute of Technology..."

There's also the soundness of the paper itself that's in question. However widely accepted research is, the conclusions drawn from their test cases can be too general, too specific or dead wrong. Natural Language Processing is a veryvery difficult thing to get right. I haven't looked at the paper yet, I just printed it out, so my ass and my elbow could be looking a bit similar right now =)

I've found that informal writing is also very prone to throwing off these kinds of algorithms. For example, in a blog, both men and women usually tend to abbreviate, especially with numbers. Well, if you're thinking that men use numbers and women spell them out, everyone will look male to you. As well, blogs and journals usually have datestamps and timestamps on them, which are usually numbers. Gender Genie claims to have a different algorithm that takes care of this case, but they're still trying to predict humans, so I still have my doubts.

Nonetheless, props to hutta for the implementation of an intriguing concept =)

Tags: quizzes
Subscribe

  • Blast from the Past!

    weather : sunny outside : 17°C mood : ... Heh, it'll be interesting to see who reads this journal anymore =) The…

  • My Hermit Life

    weather : sunny outside : 24°C mood : ... Holy tap-dancing Christ on a pogo stick, it's been a really long time.…

  • Latest Nail Art

    weather : sunny outside : 21°C mood : ... I think I understand why I like nail art so much. I'm a Business Analyst by…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 15 comments

  • Blast from the Past!

    weather : sunny outside : 17°C mood : ... Heh, it'll be interesting to see who reads this journal anymore =) The…

  • My Hermit Life

    weather : sunny outside : 24°C mood : ... Holy tap-dancing Christ on a pogo stick, it's been a really long time.…

  • Latest Nail Art

    weather : sunny outside : 21°C mood : ... I think I understand why I like nail art so much. I'm a Business Analyst by…