Thursday, November 4, 2010

Starting to get some dissertation results...

Apologies for the long delay between posts. Stock excuse: "Dissertation... blah blah blah..."

Actually, I'm starting to get some nifty results from my dissertation. I've spent a long summer writing surveys and software, and in the next few weeks I hope to have something to show for it. Exhibit A: a word cloud for an automated classifier of political content.


Orange words are associated with political content, and blue words are disassociated. The size of a word denotes the strength of association -- essentially, the size of each word corresponds to the absolute value of the beta value of the word in a logistic regression with "political-ness" as the dependent variable. The layout of the words is done by computer algorithm to conserve space; it doesn't carry any important information.

I used wordle for the layout. The classifier runs regularized logistic regression using the scikits.learn package for python. The training data is from a team of undergraduate research assistants.

No comments: