The Lowly Wonk: surveys

Showing posts with label surveys. Show all posts

Thursday, May 12, 2011

Computational social scientists: a draft directory and basic survey results

Last week, some of us* at Michigan's Center for Complex Systems circulated a survey of computational social scientists -- trying to find out who self-identifies as a compSocSci person and what they study, so that they can be in touch with each other.

We had just under 100 responses, from people at many different institutions, working in a wide variety of areas. Here are some early results.

First, the obligatory word cloud. This isn't particularly scientific, but it illustrates the concepts that people find important in this space. Not surprisingly, we had a strong showing from network people and agent-based modelers.

We also asked about broad areas where people had formal training and were currently working. The two are pretty similar, so I'll just show the graph on training.

More results, and a revised version of the directory will be forthcoming in a couple weeks. Please let us know if you have any questions. We hope these will be useful resources for the community.

Click here to take the survey. We'll keep it open for another couple weeks, so that responses can continue to trickle in.

Click here for the directory in pdf format. (To avoid spam, this doesn't include email addresses. Email me if you want a copy that includes emails.)

* Scott Page, Dan Katz, and I

Wednesday, May 11, 2011

R code to remove the second line of a Qualtrics .csv

I love Qualtrics, but its data export does this obnoxious thing. Instead of exporting a regular .csv file, it exports a csv with two header rows. The first one contains short variable names (e.g. Q1, Q2.2, Q3_TEXT) and the second one contains labels ("How old are you?", "What is your email address?")

I keep having to figure out how to tell R how to deal with this messiness. It's not complicated, but I have to look up the read.csv documentation every time.

No more. Here's my code:

DF <- read.csv("my_file.csv", skip=2, header=F)
DF2 <- read.csv( "my_file.csv" )
names(DF) <- names(DF2)

Tuesday, May 10, 2011

I'm number one on Amazon turk!

I just maxed out my credit card to get a whole bunch of work done on mTurk. For the moment, I'm the number one requester on the site!

Screen shot:

Early results look pretty good. A few turkers cheat, but I've been pleased with the good-faith effort most people seem to put into their work.

PS - I'm curious about what Randolph Stevenson (in the number 3 slot) is doing...

Tuesday, June 29, 2010

How to catch a dirty pollster

Just a couple hours ago, Markos Moulitsas (you know, the Daily Kos) announced that he's suing his former polling group, Research 2000. Evidently, they faked a lot of their data in the last 19 months, and didn't even have the decency to fake it well.

The statistical case against R2000 is made here. Nate Silver chimes in here.

The punchline:

"We do not know exactly how the weekly R2K results were created, but we are confident they could not accurately describe random polls."

What's interesting to me about the situation is how clear-cut the evidence is. Mark Grebner, Michael Weissman, and Jonathan Weissman (GWW), conduct three tests to show that the numbers are cooked. Each statistical test gives results that are devastatingly improbable.

Probability of odd-even pairs between men and women: 776 heads on 778 tosses of a fair coin, or about 1 in 10²³¹.
Probability of consistency of results in small samples: ~1 in 10¹⁶.
Probability of missing zeroes in a time trend: ~1 in 1016.

Statistically, there's almost no wiggle room here. Until the R2000 folks come forward with some really compelling counterevidence, I'm convinced they're frauds.

An interesting feature of the last two patterns is that they hint at overeagerness to give results. Patterns in the data are *too* clear, and the detailed crosstabs supplied by R2000 made it easier to catch them. If they had been willing to dump some random noise into the equations -- blurring the results -- it would have been much harder to build a conclusive case against them.

This jives with my experience doing survey research for clients: nobody wants to pay for null results. Even when your data collection is legitimate, there's always pressure to twist and squeeze the data to wring out "findings" that aren't really there. Three cheers for transparency and the scientific method.