A Glenn Beck conspiracy generator.
How does this thing work? I'm guessing mturk or some mailing list. The phrases don't seem quite formulaic enough for Markov generation or automated madlibs.
Politics, lifehacking, data mining, and a dash of the scientific method from an up-and-coming policy wonk.
Showing posts with label NLP. Show all posts
Showing posts with label NLP. Show all posts
Monday, February 21, 2011
Thursday, November 4, 2010
Starting to get some dissertation results...
Apologies for the long delay between posts. Stock excuse: "Dissertation... blah blah blah..."
Actually, I'm starting to get some nifty results from my dissertation. I've spent a long summer writing surveys and software, and in the next few weeks I hope to have something to show for it. Exhibit A: a word cloud for an automated classifier of political content.

Orange words are associated with political content, and blue words are disassociated. The size of a word denotes the strength of association -- essentially, the size of each word corresponds to the absolute value of the beta value of the word in a logistic regression with "political-ness" as the dependent variable. The layout of the words is done by computer algorithm to conserve space; it doesn't carry any important information.
I used wordle for the layout. The classifier runs regularized logistic regression using the scikits.learn package for python. The training data is from a team of undergraduate research assistants.
Actually, I'm starting to get some nifty results from my dissertation. I've spent a long summer writing surveys and software, and in the next few weeks I hope to have something to show for it. Exhibit A: a word cloud for an automated classifier of political content.

Orange words are associated with political content, and blue words are disassociated. The size of a word denotes the strength of association -- essentially, the size of each word corresponds to the absolute value of the beta value of the word in a logistic regression with "political-ness" as the dependent variable. The layout of the words is done by computer algorithm to conserve space; it doesn't carry any important information.
I used wordle for the layout. The classifier runs regularized logistic regression using the scikits.learn package for python. The training data is from a team of undergraduate research assistants.
Subscribe to:
Comments (Atom)