Monday, July 25, 2011

Brand sentiment showdown...

From FlowingData. I may use this in my ICPSR class on computational social science next week. What do you think?



Brand sentiment showdown: "

movies-revised

There are many brands on Twitter that exist to uphold an image of the company they represent. As consumers, we can communicate with these accounts, voicing praise or displeasure (usually the latter). Using a simple sentiment classifier1, I scored feelings towards major brands from 0 (horrible) to 100 (excellent) once a day for five days.


The above for example, shows scores for Netflix, Hulu, and Redbox. Netflix had the lowest scores, whereas Redbox had the highest. I suspect Netflix started low with people still upset over the price hike, but it got better the next couple of days. Then on Saturday, there was a score drop, which I'm guessing was from their downtime for most of Saturday. Hulu and Redbox, on the other hand, held more steady scores.


As for auto brands, Toyota clearly had the lowest scores. However, Lexus, which is actually a luxury vehicle division of Toyota had the highest scores in the high 90s to 100.



How about the major mobile phone companies, AT&T, Verizon, and Sprint? Verizon scored better initially, but had lower scores during the weekend. Not sure what was going on with Sprint.



Between Twitter and Facebook, there was obviously some bias, but Twitter faired slightly better. Twitter scored lower than I expected, but it probably has to do with bug reports directed towards @twitter.



Is Domino's Pizza good now? Papa John's stayed fairly steady while Pizza Hut scores were sub-par.



Finally, as a sanity check, I compared airlines like Breen did in his tutorial. Results were similar with JetBlue and Southwest clearly in the positive and the others picking up the rear.



Any of these scores seem surprising to you?





  1. Jeffrey Breen provides an easy-to-follow tutorial on Twitter sentiment in R. The scoring system is pretty basic. All you do is load tweets with a given search phrase, and then find all the 'good' words and 'bad' words. Good words give +1, and bad words give -1. Then a tweet is classified good or bad based on the total. Then to get a final score, only tweets with total of +2 or more or -2 or less are counted. The final score is computed by dividing number of negative tweets divided by total number of 'extreme' tweets. Obviously this won't pick up on sarcasm, but the scoring seems to still do a decent job. I wouldn't make any important business decisions based on these results though.



The new FlowingData book is available now.




"

Friday, July 15, 2011

Automated snowball census

After a lot of work, I pushed the new version of the paper to SSRN today. Getting ready to pack it up, send it off for review, and move on to the next thing.

Tuesday, July 12, 2011

How are new media reshaping politics? Take 2

Yesterday I posted on Hidman's "missing middle" hypothesis. Kudos to my good friend Ben Peters for some great, thought-provoking responses. Today, I'm going to push forward and respond to another great thinker in this area: Benkler's theory of the "networked public sphere."

Benkler and the Networked Public Sphere


On the other side of the debate, Yochai Benkler is an Internet optimist. He argues that many-to-many communication will invigorate the public sphere, leading to broader intake of ideas, better discussion, and ultimately better governance. Benkler is very critical of the media oligopoly of the mid-20th century, which he says was heavily influenced by money and ideology, and excersized outsized control on public access to information. According to his account, the current proliferation of online information sources is certainly better than being dependent on a handful of corporate broadcasters, even if it still falls short of utopia.

This picture of the public sphere is appealing and not entirely untrue. I want to believe it. However, Benkler fails to take into important and well-established facts about American political system.

First, most citizens in the U.S. are poorly equipped to deal with political information. Converse's half-century-old finding that as many as 90 percent of Americans are "innocent of ideology" (i.e. they have no idea what "liberal" and "conservative" mean) has been replicated and extended many times. Most voters don't know how government works, they don't know how it's supposed to work, and they don't care to find out. True, partisan cues, endorsements, and heuristics can sometimes bring voters up to speed enough to fill out a ballot, but these heuristic strategies cannot inform most citizens for participation in the public sphere the way Benkler imagines. We must distinguish between the handful of citizens who are motivated and equipped to reason about politics, and the majority who are not so prepared or inclined. Benkler's optimism really only extends as far as the electorate is capable of reasoning about democracy.

Second, Benkler ignores the structure of government and policymaking. He treats "government" as a unitary actor, and makes only passing reference to elections and political parties. Benkler is painting with a broad brush, so perhaps he can be forgiven for ignoring the institutional details of representation and government in American politics. However, those details are likely to matter, deeply.

Consider: primary responsibility for lawmaking in the U.S. falls to elected legislators. These legislators are influenced not only by the ebb and flow of ideas in public debate, but by their ability win in zero-sum, partisan elections. Proliferation of information sources may affects public debate for the better, but it also affects the electoral pressures faced by public officials. We have strong reason to believe that access to additional channels, selective exposure, and ideological pandering are leading to increased polarization in the electorate. What if this polarizing electoral effect dominates the enriching discursive effect that Benkler outlines?

I'm sympathetic to the the idea of a networked public sphere. As I said earlier, I really want it to be true. But Benkler's picture ignores key institutions in American politics, like elections and parties, so I have a hard time placing much faith in his predictions. We need to think carefully about the interplay of partisanship, ignorance, and representative government with technologies that allow cheap, many-to-many communication.

How are new media reshaping politics? Take 1

I've been studying political blogging for a couple years now, and I'm getting ready to bring it all together into a dissertation. That means it's time to move past statistics and data, and start thinking in terms of Big Ideas.

As I see it, the pressing question is "How are new media (including blogs) reshaping American politics?" This is a big question -- one that certainly matters outside of academia. But that won't stop me from writing about it in a dry, academic way. :) To my mind, Matthew Hindman, Yochai Benkler, and Cass Sunstein have put forward the three leading, competing theories for answering this question. This week, I'm going to make a first attempt at responding to and synthesizing their ideas.

Feedback and constructive criticism are very welcome.

Hindman and the Missing Middle

Matthew Hindman is an Internet pessimist. In his book, The Myth of Digital Democracy, he argues that the web has exacerbated the "rich get richer" tendencies of media markets, leading to greater inequality. To back up his assertion, he shows that links and traffic to web pages follows a power law distribution. He also interviews top 40 bloggers and claims that they are overwhelmingly white, male, high-income, and educated. His analysis suggests that the people with big audiences online are no different from those offline. Hindman labels this dramatic inequality between popular and unpopular sites "the missing middle."

However, Hindman's line of attack has two important weaknesses. First, he has no counterfactual. The distribution of online audiences is dramatically unequal, but the same is (and was) probably true offline as well. Certainly, Barack Obama, Michelle Bachman, and Thomas Friedman have daily audiences that are orders of magnitude larger than mine or yours. The same was true of their counterparts before the Internet. Audiences online are distributed unequally, but are they more unequal than those that existed offline, before the Internet? Hindman does not answer this question, and I suspect the answer is no.

Second, Hindman ignores the potential for indirect influence. The Drudge Report is one of the most heavily trafficked blogs* on the Web, but Drudge himself writes almost no content. Instead, the site features links to stories elsewhere on the Internet. How then do we think about Drudge's influence? He inserts no new ideas into public debate, but exercises some ability to influence which ideas get attention. By linking to other authors' stories, Drudge allows those authors to exercise indirect influence on his readers.

Drudge is an extreme case of the common online practice of linking. Linked content intrinsically gives others indirect influence. It is not unique to the online world (think of citations, endorsements, recommendations), but it is probably more common there. Network theory shows us that all else equal, more re-linking leads to more egalitarian distribution of indirect influence. By focusing only on direct readership, Hindman misses this possibility.

The bottom line: Hindman is the skeptic in this debate, arguing that the Internet means business as usual for participation, voice, and influence. He's only right as long as we assume that 1) offline participation is not also unequal, and 2) only direct influence (i.e. readership and web traffic) matters.

Monday, July 11, 2011

+Computation: Got an AWS in Education grant!

I just received a generous grant for usage on Amazon's Web Services -- cloud computing, storage space, and bandwidth. This is just in time for a bunch of heavy-duty text crunching I've been planning to do. Thank you, Amazon!