Showing posts with label scientific method. Show all posts
Showing posts with label scientific method. Show all posts

Saturday, April 16, 2011

What does it take to be a data scientist?

Conway on what it takes to be a data scientist (@ ZIA, ht: Mike B).


The full article is here. It's short and sweet, and offers a nice counterpoint to some of the claims made by people with a more computer-science-centric view of the world. Turns out that modeling assumptions (i.e. math and statistics) and theory (i.e. substantive expertise) matter. You ignore them at your own risk.

PS: The title makes it sound like this is about U.S. intelligence, but almost all the points in the article apply to business and academia as well.

Monday, February 28, 2011

What examples for successful prediction do we have in social science?

A question that's been bugging me: as social scientists, what can we predict reasonably well?

Following Kuhn's definition for a scientific paradigm, I'm focusing on:
1) social phenomena
2) that we can predict with enough accuracy to be useful
3) using technical skills that require special training.

I've found surprisingly few examples that satisfy all of these criteria. Only three, in fact.
  1. Aptitude testing
    Prediction: How well will a person perform in school, a given job, the Army, etc.?
    Technical skill: Psychometrics

  2. Microeconomics
    Prediction: What effect will various market interventions have on the price, quantity supplied, etc. for a specific good?
    Technical skill: Producer, consumer, and game theory

  3. Election polling
    Prediction: Who will win in a given election?
    Technical skills: Survey design, sampling theory

Can you think of any others? I've got to be missing some.

What areas *should* we be able to predict? We have all kinds of new tools as social scientists. It seems like we should be ready to tackle some new challenges.

Tuesday, July 13, 2010

a jaundiced formula for spinning educational research into something that sounds interesting


Here's a jaundiced formula for spinning educational research into something that sounds interesting. Most researchers and reporters seem to follow this formula pretty closely*.

1. Sample a bunch of kids in category A, and a bunch of kids in category B.

Ex: A kids have computers in the home; B kids don't
Ex: A kids are white; B kids are nonwhite
Ex: A kids go to charter schools; B kids don't

2. For each group, measure some dependent variable, Y, that we care about.

Ex: grades, SAT scores, dropout rates, college attendance, college completion, long term impacts on wages, quality of life, etc.

3. Compare Y means for group A and group B.
3a. If the means differ and the A versus B debate is contested, take a side with the group A.
3b. If the means don't differ and many people support one option, take the opposite stance. (Ex: "Charter schools don't outperform non-charter schools")
3c. If neither of those options works, continue on to step 4.

4. Introduce a demographic variable X, (probably gender or SES) as a control or interaction term in your regression analysis. It will probably be significant. Claim that A or B is "widening the racial achievement gap," or "narrowing the gender gap," etc. as appropriate.

Papers following this formula will frequently be publishable and newsworthy. (You can verify this, case by case, with the studies cited in that NYTimes article.) They will rarely make a substantive contribution to the science and policy of education. Awful. Awful. Awful.

Why? Because this approach is superficial. The scientific method is supposed to help us understand root causes, with an eye to making people better off. But that depends on starting with categorizations that are meaningfully tied to causal pathways. The distinctions we make have to matter.

In a great many educational studies, the categories used to split kids are cheap and easy to observe. Therefore, they make for easy studies and quick stereotypes. They feed political conflict about how to divide pies. But they don't matter in any deep, structural way.

Example: Does having a computer in the house makes a kid smarter or dumber? It depends on how the computer is used. If the computer is in the attic, wrapped in plastic, the effect of computer ownership on grades, SAT scores, or whatever will be pretty close to zero. If the computer is only used to play games, the effect probably won't be positive; and if games crowd out homework, the effect will be negative. No real surprises there. And that's about as far as these studies usually go. "Computers not a magic bullet. Next!"

This is more or less the state of knowledge with respect to school funding, busing, charter schools, etc. We know that one blunt policy intervention after another does not work miracles. We haven't really gotten under the hood of what makes the complex social system of education work. It's like coming up with a theory of how airplanes fly based on the colors they're painted. ("White airplanes travel slower than airplanes painted camouflage colors, but tail markings have little effect on air speed.) You may be able to explain more than nothing, but you certainly haven't grasped the forces that make the system work.

To say the same thing in different words, scientists are supposed to ask "why?" Studies that say "kids in group A are more Y than kids in group B" doesn't answer the why question. They are descriptive, not causal. Without a deeper causal understanding of why schools work or don't work, I don't think we're ever going to stop chasing fads and really make things better.


*This is an epistemological critique of just about every quantitative article on education. In general, I'm supportive of the increasing influence of economic/econometric analysis in education policy, but this is one area where we quants may be making things worse, not better. Hat tip to Matt for sending the article that reminded me how much the failings of this literature frustrate me.

Monday, June 14, 2010

Scientific revolutions ~ Disruptive technologies

These two wikipedia pages are really interesting to read side by side: The Structure of Scientific Revolutions and Disruptive Technology. The first is Thomas Kuhn's 50-year-old-but-still-revolutionary theory of how scientific theories advance. The other is Clayton Christensen's more recent theory of why many businesses fail to respond to innovation. Despite the fact that Kuhn was a physicist and scientific historian and Christensen is a business school professor, the two have a lot in common.



Here's a list of important concepts from both theories. I've paired roughly equivalent concepts here. I'll let you look them up from wikipedia on your own.

Kuhn / Christensen
Scientific revolutions ~ Market disruption
Paradigm ~ Value network
Paradigm shift ~ Market disruption
Coherence ~ Corporate culture
Normal science ~ Sustaining innovation
Anomalies ~ New-market disruption
New theory ~ Disruptive innovation
??? ~ Low-end disruption
??? ~ Up-market/down-market
Incommensurability ~ ???
Pre-paradigm phase ~ ???
Normal phase ~ Market growth
Revolutionary phase ~ Market disruption

One thing that strikes me as potentially interesting is the places where the two theories do *not* overlap. As a businessperson, Christensen is more interested in the development of markets and the flow of revenue. Kuhn is more interested in the change in theories over time. It strikes me that each approach may have something to offer the other.

* In science and academia, what does it mean to be "down-market?" Which departments today are incubating the revolutionary theories of tomorrow?
* What does the idea of incommensurability imply for business practice? Anecdotally, disruptive companies often have business models and cultures that are dramatically different from established companies. Should that change the way we think about entrepreneurship and venture capital?