The Lowly Wonk: software

Showing posts with label software. Show all posts

Tuesday, April 12, 2011

Lightweight pdf renderers

I'm finishing up dissertation data collection in the next ~6 weeks, which means I'm going to be spending a lot less time writing code, and a lot more time analyzing data and writing papers. So R and laTex are going to be my new best friends.

Taking a good look at my workflow around these packages, I realized that viewing pdfs was really slowing me down. Every time I generate a graph or paper, I have to open up the pdf version and see what it looks like. Adobe's very bulky software takes several seconds to load -- very frustrating when you're playing with margins or table formating and want to iterate quickly.

So I went out looking for a lightweight pdf viewer. Here's what I found:

http://www.downloadmunkey.net/2008/04/random-monday-foxit-reader-vs-pdf-xchange-viewer-vs-sumatra/
http://portableapps.com/node/17260
http://www.techsupportalert.com/best-free-non-adobe-pdf-reader.htm

Any other advice?

Based on those reviews, I'm going to give PDF-XChange a shot. I'll let you know how it goes.

Tuesday, April 5, 2011

Announcing SnowCrawl!

nnouncing the beta release of SnowCrawl, a python library for directed webcrawls. Nice features include: saved state for backup, support for threading and client-server architecture, lots of flexibility.

The project is open source, hosted at google code. More details to follow.

Wednesday, December 1, 2010

Prezi - A quick review

I spent a couple hours playing around with Prezi, an online presentation builder being promoted as an alternative to the slideshow format of powerpoint, keynote, etc. Instead of a series of slides, Prezi presentations consist of a series of views over one large image. The format parallels drawing on a whiteboard, instead of clicking through slides on a projector. A good concept, but the execution is a little clunky.

My review: Preparing good presentations is time consuming, for two reasons: 1) it takes some trial and error to figure out the best way to express an idea, verbally and visually, and 2) presentation software is clunky, requiring a lot of fiddling to get things right. In my experience, spending time on (1) is fun and creative; spending time on (2) is frustrating and stressful.

Because of the "whiteboard" metaphor and brand emphasis on good design, I was hoping Prezi would deliver a slick and streamlined user experience. Being free from interface hassles and able focus on creative expression would be wonderful. Alas, I quickly ran into many GUI annoyances.

The interface for importing images is very clunky. You have to download or save the image to your desktop, then upload. On the plus side, you can batch upload several images at a time.
The whole image is static, which means that you can't mark up images over the course of a presentation. To some extent this makes sense -- dynamic images would mess up the concept of arranging your display in space rather than time. However, it breaks the whiteboard metaphor. When I do whiteboard presentations, I often have an agenda that I revisit, adding checkmarks and lines to relevant content. I can't do that in Prezi.
Rudimentary tools for grouping object are not available. This one really gets me. You can accomplish the same thing (visually) by putting several objects together in an invisible frame. But every time you want to move the group, it takes several extra clicks to select everything and drag it around. Poor usability.
You can only use a handful of presentation styles. Your only alternative is to hire Prezi staff to build a custom style for $450.

Summary: I would really like a tool that lets me express myself clearly, fast. Prezi offers some advantages for clarity, but not really for speed. Overall, I'm mildly impressed, but not overwhelmed. For the moment, the main benefit of Prezi seems to be novelty.

Saturday, August 14, 2010

Wacom tablet: post-purchase rationalization

So I bought a Wacom Bamboo tablet last week. It's a nice digital pen tablet, with a drawing area about the size of a postcard -- a good size for sketching & gesture-based interfaces. It plugs into a USB port and (after some fiddling with the drivers) works very nicely with Windows, Firefox, Inkscape, etc.

That said, I'm not really sure why I bought the thing. I do a lot of writing and programming, which leads to a lot of typing, but not much clicking and dragging. Sure, the touchpad on my notebook is small, but it's not really $60 small. I blame postpartum sleep deprivation. It might also have something to do with watching this TED talk.

Anyway, in an effort to assuage my post-purchase cognitive dissonance, I've been telling myself that if I can improve the speed and accuracy of my clicks by just a fraction, this tablet thing will easily pay itself off in productivity in the long run. Right?

To bring some proof to that claim, I dug up this flash-based target practice game. Little targets fly around the screen and you try to click on them: score points for every target you hit; bonus points for consecutive bulls-eyes; miss too many times and the game is over. Great sound effects. This is high brainpower stuff.

I played three rounds in time trial mode, using the touchpad. Mean score 438. Then I played thrice with the tablet. Mean score 685!

To make sure this wasn't just an improvement in my reflexes and strategy (shoot the spring targets at the apex; don't waste a shot on a yo-yo target that might be yanked), I employed an interrupted time series design and played six more rounds. Mean score with the touchpad: 407. Mean score with the tablet: 977!

With that kind of performance improvement, the tablet was clearly worth it. Minority report, here I come.

Sunday, August 8, 2010

Inkscape

If you haven't tried Inkscape yet, you should. It's an open-source graphics program, like Illustrator, but cheap. Very functional, totally free, the built-in documentation is pretty solid. Since it's open source with a good user base, I expect development to come along rapidly. Worth a look.
Here's a link to the Inkscape showcase.

Here's a link to an essential tool in Inkscape: converting bitmaps to vector graphics.

Here's me showing off the result of an hour playing around with bitmaps, fonts, and vector paths: a spoof on the Baby Einstein logo.

Thursday, July 1, 2010

Reliability

I'm sold on the argument that Krippendorff's alpha is the way to go for most reliability calculation. Unfortunately, it's been hard to find code to calculate it. Here are some of the best links I've run across.

The NLTK package for python has code for computing alpha. It looks like this does basic nominal calculation; I don't know if/how it copes with missing data.

The concord package in R does nominal, ordinal, interval, and ratio versions of alpha. It looks like this might not be maintained anymore, but it works.

Here's a nice page of resources by computational linguists Artstein and Poesio. Unfortunately, what they show is mainly that there aren't very good resources out there. Their review article is very good -- the best treatment of reliability I've seen in the NLP community so far.

Deen Freelon has some links to reliability calculators and resources, including two nice online reliability calculators: Recal-OIR, and Recal-3.

Krippendorff's oddly-formatted, information-sparse web page. He invents the best measure for calculating reliability, then keeps a lid on it. Less animated bowtie dogs, and more software, please!

Matthew Lombard has a nice page on reliability statistics and the importance of reliability in content analysis in general.

Beg: Does anybody know how to compute K's alpha for a single coder?

I have data coded by several coders and need to know who's doing a good job, reliability-wise, and who's not. At a pinch, I'd be willing to use a different reliability statistic, or even an IRT model. It just needs to be statistically defensible and reasonably easy to code.

Friday, May 28, 2010

Software for text mining, esp unsupervised document clustering

A friend in the department recently asked me about software for text mining. Among other things, she was looking for programs that do "unsupervised document clustering." I went through my notes and did some web searching and came up with some promising options.

I haven't worked with any of these directly (unsupservised learning is a step removed from the stuff I do) but I figured the results of the search were worth passing on.

One option close to hand is WordStat on the computer in the [UM political science] bullpen. It supports clustering and is pretty easy to use.

Another option is Justin Grimmer's Galileo package. I don't know if he's made this publically available yet. Last I heard he was trying to patent and maybe market it. Grimmer is one of Gary King's students; he was on the market this alst year. One plus to using Grimmer's work is that he's published in polisci journals, so his methods already have good credibility within the field.

A third option: RapidMiner. I haven't used this, but it's free, well-documented, and fits the bill for what you're trying to do.

Like I said, I haven't worked with any of these directly. Anybody have good/bad experiences with this kind of software?