Tuesday, June 29, 2010

How to catch a dirty pollster


Just a couple hours ago, Markos Moulitsas (you know, the Daily Kos) announced that he's suing his former polling group, Research 2000. Evidently, they faked a lot of their data in the last 19 months, and didn't even have the decency to fake it well.

The statistical case against R2000 is made here. Nate Silver chimes in here.

The punchline:
"We do not know exactly how the weekly R2K results were created, but we are confident they could not accurately describe random polls."
What's interesting to me about the situation is how clear-cut the evidence is. Mark Grebner, Michael Weissman, and Jonathan Weissman (GWW), conduct three tests to show that the numbers are cooked. Each statistical test gives results that are devastatingly improbable.

  • Probability of odd-even pairs between men and women: 776 heads on 778 tosses of a fair coin, or about 1 in 10231.
  • Probability of consistency of results in small samples: ~1 in 1016.
  • Probability of missing zeroes in a time trend: ~1 in 1016.

Statistically, there's almost no wiggle room here. Until the R2000 folks come forward with some really compelling counterevidence, I'm convinced they're frauds.

An interesting feature of the last two patterns is that they hint at overeagerness to give results. Patterns in the data are *too* clear, and the detailed crosstabs supplied by R2000 made it easier to catch them. If they had been willing to dump some random noise into the equations -- blurring the results -- it would have been much harder to build a conclusive case against them.

This jives with my experience doing survey research for clients: nobody wants to pay for null results. Even when your data collection is legitimate, there's always pressure to twist and squeeze the data to wring out "findings" that aren't really there. Three cheers for transparency and the scientific method.

Monday, June 28, 2010

Reasons to study political blogging


I'm working like crazy on my dissertation prospectus. Data work, lit reviews, etc. To escape from early research purgatory, I plan to blog parts of the prospectus as I write them.

I'll kickoff today with introductory definitions and motivation. Feedback is much appreciated. Beware of dry, academic writing!

What is a blog?
Paraphrasing wikipedia, a blog is a website containing regular entries ("posts") of commentary, links, or other material such as photos or video. On most blogs, posts are displayed in reverse-chronological order -- the most recent post appears first. Although most blogs are maintained by individuals, some are run by small groups, and blogs speaking on behalf of corporations, churches, newspapers, political campaigns, etc. are increasingly common. Many blogs focus on a specific topic, ranging from broad to narrow: entertainment, cooking, astronomy,the Detroit Tigers, to cold fusion. For my dissertation, I plan to focus on political blogs.

Why study political blogs?
Here are five reasons to study political blogs.
  1. Blogs are public facing. Lots of people read them, including politicians and journalists. The extent to which blogs are replacing mainstream media is an open question, but it's certain that blogs have come to play an important role in public discourse, with real impact on politics.
  2. Bloggers span a wide variety of opinions. The blogosphere embraces everyone from conservative wingnuts to liberal moonbats to political moderates. Some political bloggers are politically omnivorous, writing about anything political. Others focus on specific issues and topics: foreign policy, Congress, feminism, etc.
  3. Bloggers include both experts and amateurs. Dividing the same pie in a different direction, many A-list bloggers (e.g. Andrew Sullivan, Ariana Huffington, Glenn Reynolds, Michelle Malkin) clearly qualify as political elites: they are experts, immersed in politics, well-informed and well-connected. Other political bloggers are more obscure, casual -- closer to the average Joes who make up the "mass public."
  4. Blogs are updated frequently. This has two nice consequences. First, frequent posts allow us to replay bloggers' reactions to events as they unfold. Second, frequent posts mean we have a lot of posts to work with.
  5. Blogs are archived publicly. Unlike most forms of political speech and action, blogging leaves a permanent data trail.
The combination of these attributes creates a kind of perfect storm for social science. Understanding the flow of opinions and ideas has always been difficult for social scientists, because most of our data have come from surveys.

Friday, June 18, 2010

AI for Jeopardy: IBM's Watson

In case you haven't heard, IBM is creating a supercomputer to play Jeopardy. The idea is to build AI that can answer factual questions in regular, spoken English. The system, named "Watson" after two longtime IBM presidents, has been in development for a couple years now, and is starting to beat strong-ish human competitors. It's slated to play against past Jeopardy grandmasters sometime this fall.

Here's a short promotional video by IBM.

Here's an extended article discussing the challenge and the technology in NYTimes.

Related: from xkcd.

Monday, June 14, 2010

Scientific revolutions ~ Disruptive technologies

These two wikipedia pages are really interesting to read side by side: The Structure of Scientific Revolutions and Disruptive Technology. The first is Thomas Kuhn's 50-year-old-but-still-revolutionary theory of how scientific theories advance. The other is Clayton Christensen's more recent theory of why many businesses fail to respond to innovation. Despite the fact that Kuhn was a physicist and scientific historian and Christensen is a business school professor, the two have a lot in common.



Here's a list of important concepts from both theories. I've paired roughly equivalent concepts here. I'll let you look them up from wikipedia on your own.

Kuhn / Christensen
Scientific revolutions ~ Market disruption
Paradigm ~ Value network
Paradigm shift ~ Market disruption
Coherence ~ Corporate culture
Normal science ~ Sustaining innovation
Anomalies ~ New-market disruption
New theory ~ Disruptive innovation
??? ~ Low-end disruption
??? ~ Up-market/down-market
Incommensurability ~ ???
Pre-paradigm phase ~ ???
Normal phase ~ Market growth
Revolutionary phase ~ Market disruption

One thing that strikes me as potentially interesting is the places where the two theories do *not* overlap. As a businessperson, Christensen is more interested in the development of markets and the flow of revenue. Kuhn is more interested in the change in theories over time. It strikes me that each approach may have something to offer the other.

* In science and academia, what does it mean to be "down-market?" Which departments today are incubating the revolutionary theories of tomorrow?
* What does the idea of incommensurability imply for business practice? Anecdotally, disruptive companies often have business models and cultures that are dramatically different from established companies. Should that change the way we think about entrepreneurship and venture capital?

Tuesday, June 8, 2010

Zinger or below the belt? WSJ argues that "liberals and Democrats" are economically unenlightened

An editorial in today's WSJ (Are You Smarter Than a Fifth Grader? ) argues that "self-identified liberals and Democrats do badly on questions of basic economics."

Written by a George Mason economist, the article is admirably transparent in its reasoning. The analysis turns on a battery of Econ 101-style questions on a Zogby poll (e.g. "Restrictions on housing development make housing less affordable.) It turns out that self-identified Republicans and libertarians score substantially better on this quiz than Democrats. Conclusion: the left doesn't understand, or is unwilling to accept, the fundamental economic tradeoffs that exist in any society. In the author's words, the left is "economically unenlightened."

Usually when I see this kind of thing in the WSJ, I'm inclined to ignore it as partisan sniping. In this case, they lay out their methodology thoroughly enough to invite inspection. And against my will, I find myself agreeing, because I don't see anything wrong with the analysis. Here's my reasoning.

The first thing to check is the quality of evidence. In order to score respondents' answers, the researchers had to designate right and wrong answers to each question on the quiz. What questions, exactly, were asked? Did the scoring reflect objective truth, or was there libertarian dogma in the way things were framed?

Here are the 8 questions.










Question"Unenlightened" AnswerValidity
1. Restrictions on housing development make housing less affordableDisagreeHigh
2. Mandatory licensing of professional services increases the prices of those servicesDisagreeHigh
3. Rent control leads to housing shortagesDisagreeHigh
4. Free trade leads to unemploymentAgreeLow
5. Minimum wage laws raise unemploymentDisagreeMedium
6. Overall, the standard of living is higher today than it was 30 years agoDisagreeHigh
7. A company with the largest market share is a monopolyAgreeMedium
8. Third World workers working for American companies overseas are being exploitedAgreeLow


On the whole, these questions strike me as having high validity. They measure what they intend to measure. A respondent who answers these questions correctly probably does have a better understanding of the likely consequences of economic policies. And therefore, it looks like a substantial part of the left's constituency is unwilling to come to grips with hard choices.

I'm not sure I want to believe that. Does anybody see a way out of this conclusion?



Notes on specific questions:
Questions 1 through 5 focus on fundamental tradeoffs in price, quantity supplied, and market intervention. The first three are well-grounded in evidence. The fourth is true -- in the long run. The last one is contested, but (having read up on the subject for a final debate in a policy analysis class) the proponderance of evidence supports this conclusion. Bottom line: both theory and evidence strongly suggest that the tradeoffs described in these questions are real forces in society.

Question six is a simple factual question about recent economic history. Question seven is a vocab question.

Question eight is more values-based. A typical economist will tell you that Third World workers aren't being exploited, because they voluntarily choose to accept and continue in those jobs. Companies aren't exploiting people; they're giving them new opportunities. The counterargument is that (some) workers are led into sweatshop jobs under false pretenses, and held there against their wills. This is exploitation. Additionally, one could argue that it is "exploitation" in a moral sense for a company to pays its workers only Third World wages plus a fraction when it could pay more.

Saturday, June 5, 2010

"Embrace the wonk"

Great article from the Columbia Journalism Review on the standoffish relationship between political scientists and journalists. Both purport to do the same thing: explain politics. But each has a very different MO.

Here's my take, really just a rehashing of CJR:

The political scientist's criticism of pop journalism: the media are full of overreactions to the day's soundbites and polls. A couple of anecdotes from self-serving sources do not constitute analysis. Dig deeper. It's about structure, not just personality.

The beat journalist's criticism of polisci: ivory tower types rely too much on stats and models. The findings are inaccessible, and often unsurprising. Cautious explanations for political events typically arrive years too late to have any meaningful impact. Get your finger on the pulse.

Friday, June 4, 2010

Hat tip Sam


http://www.penny-arcade.com/comic/2010/6/4/

On netbooks and utility computing

Talking with Matt and Dave last night, I said something along the lines of "this netbook is the last computer I expect to buy." In retrospect, I didn't mean it. I plan to buy plenty more gadgets in the future.

Here's what I *did* mean: this is the last laptop I expect to buy for its processor. As cloud computing becomes ubiquitous, virtually all processing will be done remotely. Your laptop -- and your desktop, probably -- will just be a terminal used to access the grid. With factory-like server grids supplying processing cycles at utility prices, you won't have any reason to supply your own computing. The time is coming rapidly.

In other words, my netbook isn't a computer. It's a terminal. Owning a computer is like owning a spinning wheel, plow, or mill -- not worth the time and money it takes to use them. Buying a low-end netbook is a statement about technological integration. I don't need own the machinery for processing my own data, in the same way that I don't need to own the machinery to make my own paper or build my own car.

Tuesday, June 1, 2010

Summer plans

This summer I'm working on my prospectus, but there are several dependencies. Developing methods, software, and lines of argument for automated content analysis is going to be particularly involved.



Here's my chart for keeping myself organized, with sections filled in to the extent that they are done.