I'm sold on the argument that
Krippendorff's alpha is the way to go for most reliability calculation. Unfortunately, it's been hard to find code to calculate it. Here are some of the best links I've run across.
The NLTK package for python has
code for computing alpha. It looks like this does basic nominal calculation; I don't know if/how it copes with missing data.
The
concord package in R does nominal, ordinal, interval, and ratio versions of alpha. It looks like this might not be maintained anymore, but it works.
Here's a nice
page of resources by computational linguists Artstein and Poesio. Unfortunately, what they show is mainly that there aren't very good resources out there. Their
review article is very good -- the best treatment of reliability I've seen in the NLP community so far.
Deen Freelon has some links to
reliability calculators and resources, including two nice online reliability calculators:
Recal-OIR, and
Recal-3.
Krippendorff's oddly-formatted, information-sparse
web page. He invents the best measure for calculating reliability, then keeps a lid on it. Less animated bowtie dogs, and more software, please!
Matthew Lombard has a nice
page on reliability statistics and the importance of reliability in content analysis in general.
Beg: Does anybody know how to compute K's alpha for a single coder?
I have data coded by several coders and need to know who's doing a good job, reliability-wise, and who's not. At a pinch, I'd be willing to use a different reliability statistic, or even an IRT model. It just needs to be statistically defensible and reasonably easy to code.