Log in

No account? Create an account

Previous Entry | Next Entry

LJ Readability Statistics Feature

[weather|cloudy][water|4 by 1439h]

xinit was saying that the LJ Random feature needed some idiot filters, so that we can hit worthwhile journals.

I said that it probably wouldn't be difficult to integrate a Readability Statistics module into the LJ updates to get something like a Flesch Reading Ease and Flesch-Kincaid Grade Level. Then on my Random search, I would specify that I want to only see the journals with an average Reading Ease of 60-95 AND a Grade Level of 4.0-8.0

The Flesch Reading Ease is a 100-point scale. 100 is the "easy to read" end. It's based on the number of words divided by the number of sentences and the average number of syllables in your words. The Flesch-Kincaid Grade Level is based on U.S. grade school levels and is computed based on the same statistics as the FRE.

Straight from the horse's mouth:

The formula for the Flesch Reading Ease score is:

206.835 - (1.015 x ASL) - (84.6 x ASW)

The formula for the Flesch-Kincaid Grade Level score is:

(.39 x ASL) + (11.8 x ASW) - 15.59


ASL = average sentence length (the number of words divided by the number of sentences)

ASW = average number of syllables per word (the number of syllables divided by the number of words)

They're not very sophisticated algorithms (from an implementation standpoint), but I question the "magic numbers" in there. From experience, they're very error prone in cases where people write like they were kicked in the head by a mule at birth.

My spelling and grammar in my own journal entries are actually quite atrocious. I average a Flesch Reading Ease of 80 (anywhere from 60-90, but more entries closer to 90), and average F-K Grade Level of 6 (anything from Gr. 2 to 8, but most in the 5-6 level). Still, you can do generalizations like: if the Reading Ease is low and Grade Level is low then, more than likely, it's an Idiot Entry. HOWEVER, some of my entries came out with very similar stats (the shorter ones where I'm trying to make things very clear =)

If I could somehow get a count of the number of times the Spelling & Grammar Checker had to stop and prompt the user for input, that would add a third dimension of accuracy =) RE and GL being the same, if the Number of Prompts is low, it's an intelligent entry. If the Number of Prompts is high as well, it should flip the Idiot Flag. =)

Microsoft's NLP group is actually pretty l33t. With that group, the shortcomings aren't really because of Microsoft, it's because natural language processing, itself, is a difficult task, even following set grammar rules strictly.

This entry, for example, has a Flesch Reading Ease of 58.9, a Flesch-Kincaid Grade Level of 10.5 and prompts me twice for changes (both of which are an incorrect interpretation of the grammar rule in question). It has a lot of trouble with prepositional phrases. It can't quite tell the difference between the passive voice and an active, present progressive tense. It's telling me there are things wrong with my entry that, I swear, are correct. That IS how you spell "l33t". And I'm sure there are all sorts of things wrong with my entry that it's not picking up.

*shrug* Food for thought.

Flesch Reading Ease  58.9
Flesch-Kincaid Grade Level  10.5
Number of Prompts  2



( 10 comments — Leave a comment )
Feb. 27th, 2002 05:14 pm (UTC)
No major worries on sampling... perhaps you could have it randomly sample the last 100 entries, seeking entries of at least 100 words in length. I'd hate to have someone punished and treated like an idiot based solely on the last post they've made. Though, that could have the added benefit of eliminating a whole bunch of people who post nothing but quiz results.

Maybe checkboxes to allow / dissallow people who posted images in the past 100 entries, people whose last 100 entries average more than / less than 1000 words in length, etc. It'd be great to filter on any number of things, with the Flescher stuff only being a guideline.

Sample size: (number of recent posts to sample)
Minimum length of posts: (0 to disable)
Maximum length of posts: (0 to disable)
Minimum readability index: (0 to disable)
Feb. 27th, 2002 05:15 pm (UTC)
Odd... form no workie in certain views of that post... lovely. Works in "reply" mode though.
Feb. 27th, 2002 07:07 pm (UTC)
Where the heck have you been hiding all day?
Feb. 27th, 2002 07:12 pm (UTC)
I've been right here... =) I posted this at 11 PST and I even commented in your journal today at ~15 PST.
Feb. 27th, 2002 07:28 pm (UTC)
Yes, but where is the deep and insightful commentary?
Feb. 27th, 2002 07:31 pm (UTC)
I don't do deep and insightful anymore. Shallow and stupid are in. Wudya think? Doesn't this entry just make me look like a drunk and confused sorority girl?
Feb. 27th, 2002 07:35 pm (UTC)
Keep talking, I'm interested now. :)
(Deleted comment)
Jun. 5th, 2002 07:36 pm (UTC)
Re: what a wonderful idea!
Heh, did you see my follow up to that? =)
May. 4th, 2003 11:21 am (UTC)
What about picture journals?
Automatically discounted?
May. 4th, 2003 01:40 pm (UTC)
Oh, I didn't think that far into it =) I'd imagine picture journals would be in a class of their own. =)
( 10 comments — Leave a comment )


The Bride of the First House

Latest Month

March 2015