Friday, July 23, 2010

Mutiny, information technology, and technocracy

    Responding to David Brooks, on the recent upsurge in technocracy and its risks. (This post was originally part of a discussion thread with friends, but I got into it enough to decide to put it up here.)

    I'm convinced technology is increasing the marginal truthfulness of many progressive claims. I don't believe it fundamentally changes the relationship among individuals, economies, government, and other social institutions.

    The example that comes to mind is mutiny. Mutiny was a huge worry for captains and navies for hundreds of years, right up until the invention of the radio. See mutiny for interesting reading, and watch Mutiny on the Bounty and The Caine Mutiny (starring Humprey Bogart!) for good fictional accounts of the psychology and institutions that shape mutiny.

    Radio was a game changer. Since the invention and adoption of radio, mutiny has been almost unheard of, especially among large naval powers. (The Vietnam-era SS_Columbia_Eagle_incident is the exception that proves the rule.) Shipboard radios tighten the link between the captain's authority and the worldwide chain of command. It makes escape extremely unlikely for mutineers. However, desertion, disobedience, cowardice, incompetence, corruption, theft, etc. are still problems for ships and navies.

    As I see it, mutiny was already a marginal activity -- very risky for the mutineers, with a low probability of success -- and radio pushed the marginal success rate much lower. But mutiny is just one act. From the perspective of naval efficiency, radio changed the balance of power, but didn't fix the underlying social problems of enforcing discipline and coordinating action. Radio caused changes in the social structure of ships, but they didn't fundamentally alter the problems that navies face.

    Information technology is doing a similar thing today. It lowers the cost of storing, transmitting, aggregating, and manipulating data. Where lower transaction costs can solve social problems, the progressives (and I'm one of them, cautiously) are right to be optimistic.

    But many kinds of information have been cheap for a long time. Socially, we haven't solved the problems of greed, lying, bureaucratic turf wars, bullying, corruption, graft, incompetence... When *those* are the real causes, changes in information technology can't be expected to help nearly as much. We need to invent better institutions first.The example that comes to mind is mutiny. Mutiny was a huge worry for captains and navies for hundreds of years, right up until the invention of the radio. See mutiny for interesting reading, and watch Mutiny on the Bounty and The Caine Mutiny (starring Humprey Bogart!) for good fictional accounts of the psychology and institutions that shape mutiny.Radio was a game changer. Since the invention and adoption of radio, mutiny has been almost unheard of, especially among large naval powers. (The Vietnam-era SS_Columbia_Eagle_incident is the exception that proves the rule.) Shipboard radios tighten the link between the captain's authority and the worldwide chain of command. It makes escape extremely unlikely for mutineers. However, desertion, disobedience, cowardice, incompetence, corruption, theft, etc. are still problems for ships and navies.As I see it, mutiny was already a marginal activity -- very risky for the mutineers, with a low probability of success -- and radio pushed the marginal success rate much lower. But mutiny is just one act. From the perspective of naval efficiency, radio changed the balance of power, but didn't fix the underlying social problems of enforcing discipline and coordinating action. Radio caused changes in the social structure of ships, but they didn't fundamentally alter the problems that navies face.Information technology is doing a similar thing today. It lowers the cost of storing, transmitting, aggregating, and manipulating data. Where lower transaction costs can solve social problems, the progressives (and I'm one of them, cautiously) are right to be optimistic.But many kinds of information have been cheap for a long time. Socially, we haven't solved the problems of greed, lying, bureaucratic turf wars, bullying, corruption, graft, incompetence... When *those* are the real causes, changes in information technology can't be expected to help nearly as much. We need to invent better institutions first.">

Monday, July 19, 2010

Top secret America: Simultaneously sinister and incompetent

The Washington Post just put out an investigative report on "Top Secret America" -- federal departments and agencies with top secret security clearance.

Here's the trailer. Yes, a 1:47 Flash video trailer for a report put out by a newspaper. It has a Bourne Ultimatum feel to it, right down to the percussion-heavy, digital soundtrack and gritty urban imagery. Anybody want to talk about the blurring lines between news and entertainment?

WPost claims that a fourth, secret branch of government has opened up since 9-11. They make this new branch out to be simultaneously sinister and incompetent (like Vogons). Watch the trailer and see what they're hinting at.

Some counterclaims: government is still learning how to make use of recent (i.e. within the last 30 years) advances in information technology. (Take this article in the Economist on the alleged obsolescence of the census, and data.gov as examples).
  • Since the technology is new, it makes sense that new agencies would develop to handle the load.
  • Information technology is useful, and the government has a lot of data to process, so it makes sense that there would be a lot of people involved.
  • Since business tends to be more nimble about technology adoption, it makes sense that a lot of the work would be outsourced to private firms, at least initially.
  • Since the process is unfamiliar, it makes sense that there would be some inefficiency.
  • Since we want the system to be robust to failure, it makes sense that there would be some redundancy.
I agree with the Post that "we want to get this right," but let's back off the hype and the alarmism a little bit.

Tuesday, July 13, 2010

a jaundiced formula for spinning educational research into something that sounds interesting


Here's a jaundiced formula for spinning educational research into something that sounds interesting. Most researchers and reporters seem to follow this formula pretty closely*.

1. Sample a bunch of kids in category A, and a bunch of kids in category B.

Ex: A kids have computers in the home; B kids don't
Ex: A kids are white; B kids are nonwhite
Ex: A kids go to charter schools; B kids don't

2. For each group, measure some dependent variable, Y, that we care about.

Ex: grades, SAT scores, dropout rates, college attendance, college completion, long term impacts on wages, quality of life, etc.

3. Compare Y means for group A and group B.
3a. If the means differ and the A versus B debate is contested, take a side with the group A.
3b. If the means don't differ and many people support one option, take the opposite stance. (Ex: "Charter schools don't outperform non-charter schools")
3c. If neither of those options works, continue on to step 4.

4. Introduce a demographic variable X, (probably gender or SES) as a control or interaction term in your regression analysis. It will probably be significant. Claim that A or B is "widening the racial achievement gap," or "narrowing the gender gap," etc. as appropriate.

Papers following this formula will frequently be publishable and newsworthy. (You can verify this, case by case, with the studies cited in that NYTimes article.) They will rarely make a substantive contribution to the science and policy of education. Awful. Awful. Awful.

Why? Because this approach is superficial. The scientific method is supposed to help us understand root causes, with an eye to making people better off. But that depends on starting with categorizations that are meaningfully tied to causal pathways. The distinctions we make have to matter.

In a great many educational studies, the categories used to split kids are cheap and easy to observe. Therefore, they make for easy studies and quick stereotypes. They feed political conflict about how to divide pies. But they don't matter in any deep, structural way.

Example: Does having a computer in the house makes a kid smarter or dumber? It depends on how the computer is used. If the computer is in the attic, wrapped in plastic, the effect of computer ownership on grades, SAT scores, or whatever will be pretty close to zero. If the computer is only used to play games, the effect probably won't be positive; and if games crowd out homework, the effect will be negative. No real surprises there. And that's about as far as these studies usually go. "Computers not a magic bullet. Next!"

This is more or less the state of knowledge with respect to school funding, busing, charter schools, etc. We know that one blunt policy intervention after another does not work miracles. We haven't really gotten under the hood of what makes the complex social system of education work. It's like coming up with a theory of how airplanes fly based on the colors they're painted. ("White airplanes travel slower than airplanes painted camouflage colors, but tail markings have little effect on air speed.) You may be able to explain more than nothing, but you certainly haven't grasped the forces that make the system work.

To say the same thing in different words, scientists are supposed to ask "why?" Studies that say "kids in group A are more Y than kids in group B" doesn't answer the why question. They are descriptive, not causal. Without a deeper causal understanding of why schools work or don't work, I don't think we're ever going to stop chasing fads and really make things better.


*This is an epistemological critique of just about every quantitative article on education. In general, I'm supportive of the increasing influence of economic/econometric analysis in education policy, but this is one area where we quants may be making things worse, not better. Hat tip to Matt for sending the article that reminded me how much the failings of this literature frustrate me.

Friday, July 2, 2010

Thursday, July 1, 2010

Reliability

I'm sold on the argument that Krippendorff's alpha is the way to go for most reliability calculation. Unfortunately, it's been hard to find code to calculate it. Here are some of the best links I've run across.

The NLTK package for python has code for computing alpha. It looks like this does basic nominal calculation; I don't know if/how it copes with missing data.

The concord package in R does nominal, ordinal, interval, and ratio versions of alpha. It looks like this might not be maintained anymore, but it works.

Here's a nice page of resources by computational linguists Artstein and Poesio. Unfortunately, what they show is mainly that there aren't very good resources out there. Their review article is very good -- the best treatment of reliability I've seen in the NLP community so far.

Deen Freelon has some links to reliability calculators and resources, including two nice online reliability calculators: Recal-OIR, and Recal-3.

Krippendorff's oddly-formatted, information-sparse web page. He invents the best measure for calculating reliability, then keeps a lid on it. Less animated bowtie dogs, and more software, please!

Matthew Lombard has a nice page on reliability statistics and the importance of reliability in content analysis in general.

Beg: Does anybody know how to compute K's alpha for a single coder?

I have data coded by several coders and need to know who's doing a good job, reliability-wise, and who's not. At a pinch, I'd be willing to use a different reliability statistic, or even an IRT model. It just needs to be statistically defensible and reasonably easy to code.