Wednesday, May 11, 2011

R code to remove the second line of a Qualtrics .csv

I love Qualtrics, but its data export does this obnoxious thing. Instead of exporting a regular .csv file, it exports a csv with two header rows. The first one contains short variable names (e.g. Q1, Q2.2, Q3_TEXT) and the second one contains labels ("How old are you?", "What is your email address?")

I keep having to figure out how to tell R how to deal with this messiness. It's not complicated, but I have to look up the read.csv documentation every time.

No more. Here's my code:


DF <- read.csv("my_file.csv", skip=2, header=F)
DF2 <- read.csv( "my_file.csv" )
names(DF) <- names(DF2)

4 comments:

Anonymous said...

Thank you! I was having the exact issue with Qualtrics and R. This is exactly what I was looking for.

Matt said...

Quite helpful, thanks! I was surprised read.csv didn't have a built-in option for skipping the second line - this seems common enough, although more sites are providing data formatted for machine-reading.

Anonymous said...

Why not just:

DF <- read.csv("mydata.csv", skip=1, header=T)

?

Anonymous said...

DF <- read.csv("mydata.csv", skip=1, header=T)

... skips the first line (actual variable names) and uses the variable descriptions as var names.