Monday, April 1, 2013

An article of interest

This article reveals something that might strike fear in our anonymously blogging hearts: a new program that uses algorithms to match anonymous writing samples and styles with the original author. The work evolved out of the "Gender Guesser" or "Gender Genie" project a team of researchers in 2003 from the Illinois Institute of Technology and Bar-Ilan University in Israel.


DHS uses technology to unmask Anonymous hacktivists
by Adrian Lee

Two years ago, the Department of Homeland Security created a priority tasking: how to tie Internet writing to Anonymous hacktivists that would stand up in a court of law. According to one DHS official who was not authorized to comment publicly, the project grew out of the "Gender Guesser" program created by a group of researchers in 2003.

"We figured if an algorithm could guess the gender of a writer, then it could eventually match the style of an anonymously written text to the actual author."

Using writing samples from hundreds of known authors and volunteers within the DHS, researchers at the Illinois Institute of Technology and Bar-Ilan University in Israel reunited and began working on a program that would expand upon their previous work.

Although the algorithm is in its early stages, it matches writing samples of at least 500 words to another sample of known writing correctly at least 85% of the time.

Dr. Adam Levi, lead researcher, explains the high correlation, "People can't hide who they are. They unconsciously give away elements that are unique to their writing: turns of phrase, word order, and even punctuation. All of these elements are unconscious, much like we found when working the 'Gender Guesser' program a decade ago."

Another official at DHS hopes that the algorithm will begin to work on smaller samples.

"We hope to eventually be able to match extremely small writing samples, such as those found on Twitter, to assist law enforcement in apprehending and prosecuting those who hide behind anonymity on the web."

The rest of the article is here


  1. Nice! I have to admit, when I get anonymous reviewer comments, I spend some time trying to guess the reviewers' gender.

  2. The Gender Guesser is a real algorithm. Feel free to check it out. It's always fun.

  3. I test very strongly in the wrong gender. I was fascinated.

    1. Of course I've had years and years of training learning not to write like a girl.

    2. Merely, I come out 80% male... any program that systematically mislables professional women doesn't work well.

    3. Isn't there a whole lot of work showing that all those markers of "gender" in language are actually markers of subordination? If you speak with authority, you "sound like a man".

  4. Hold it; what about the April 1 joke stuff? I thought this was going to be a fake exercise site until this evening!

  5. It's all over, baby. The page is always 12 hours in the future.

  6. I'm guessing that at least two three-letter agencies want to know who Strelnikov really is.

  7. So... run your posts through google translate a couple of times before converting it back into english and you end up with something that is "sanitized".

  8. Gender Guesser and related software are awesome at guessing the identities of poor writers. It's totally easy to game the gender one; it's based on some really pernicious stereotypes about gender and writing (eg women use more personal pronouns). I tested it out on a range of my own writing samples. All the personal pieces it labelled as female, all the academic pieces as male. Groundbreaking. Scholarly writing is produced by penises. Give me a fucking break.