Tuesday, May 17, 2011

Slides from JITP: The Future of Computational Social Science

I'm at JITP's conference on the future of computational social science this week. Really interesting gathering of social scientists (and a smattering of CS people) interested in computational social science. I'll blog more thoughts on the conference once it's over (tonight).

For now, I'll post the resource I already have put together: my conference slides.

Thursday, May 12, 2011

Computational social scientists: a draft directory and basic survey results

Last week, some of us* at Michigan's Center for Complex Systems circulated a survey of computational social scientists -- trying to find out who self-identifies as a compSocSci person and what they study, so that they can be in touch with each other.

We had just under 100 responses, from people at many different institutions, working in a wide variety of areas. Here are some early results.

First, the obligatory word cloud. This isn't particularly scientific, but it illustrates the concepts that people find important in this space. Not surprisingly, we had a strong showing from network people and agent-based modelers.




We also asked about broad areas where people had formal training and were currently working. The two are pretty similar, so I'll just show the graph on training.


More results, and a revised version of the directory will be forthcoming in a couple weeks. Please let us know if you have any questions. We hope these will be useful resources for the community.

Click here to take the survey. We'll keep it open for another couple weeks, so that responses can continue to trickle in.

Click here for the directory in pdf format. (To avoid spam, this doesn't include email addresses. Email me if you want a copy that includes emails.)

* Scott Page, Dan Katz, and I

Wednesday, May 11, 2011

R code to remove the second line of a Qualtrics .csv

I love Qualtrics, but its data export does this obnoxious thing. Instead of exporting a regular .csv file, it exports a csv with two header rows. The first one contains short variable names (e.g. Q1, Q2.2, Q3_TEXT) and the second one contains labels ("How old are you?", "What is your email address?")

I keep having to figure out how to tell R how to deal with this messiness. It's not complicated, but I have to look up the read.csv documentation every time.

No more. Here's my code:


DF <- read.csv("my_file.csv", skip=2, header=F)
DF2 <- read.csv( "my_file.csv" )
names(DF) <- names(DF2)

Tuesday, May 10, 2011

I'm number one on Amazon turk!

I just maxed out my credit card to get a whole bunch of work done on mTurk. For the moment, I'm the number one requester on the site!


Screen shot:

Early results look pretty good. A few turkers cheat, but I've been pleased with the good-faith effort most people seem to put into their work.

PS - I'm curious about what Randolph Stevenson (in the number 3 slot) is doing...

Thursday, May 5, 2011

Working paper: An automated snowball census of the political web

Here's my paper for the JITP Future of Computational Social Science conference in a couple weeks. This paper describes my process for using SnowCrawl and a highly trained text classifier to search out political web sites -- pretty much all of them -- on the web.

Final census results are available here. I'm planning to run another iteration of this census before too long. I welcome comments and suggestions.