Politics, lifehacking, data mining, and a dash of the scientific method from an up-and-coming policy wonk.
Monday, May 31, 2010
Free online image converter
http://www.coolutils.com/Online/Image-Converter/
Used this to add a logo to Rachota on my desktop. Painless.
Friday, May 28, 2010
Removing bloatware from my Toshiba 305 netbook
I worried that it would take a long time, because Toshiba has a reputation for bundling a lot of useless software. In the end, it wasn't too bad. Here are the steps I followed.
1. Apply pcdecrapifier to remove trial software and demos
PC decrapifier is free for personal use. It was very easy to download and install. It doesn't do anything you couldn't do on your own, but it streamlines the process a little bit.
I used it to get rid of trial versions, demos, and software I just plain didn't want: the Office 2007 trial, Toshiba's online backup, Bejeweled, etc. There were a few programs I wasn't sure if I wanted (e.g. windows live, Adobe AIR), so I kept them for now. Easy enough to purge later, and I'm not hurting for hard drive space at this point.
2. Use "Remove programs" to get rid of holdouts
PCDecrapifier wasn't able to get rid of everything, so I used "Remove Programs" in the windows control panel to get rid of the rest. Norton's demo was especially stubborn. Boo Norton.
3. Turn off startup utilities using msconfig
Acting on a suggestion from this forum, I ran msconfig from the ol' DOS prompt. I took the draconian action of disabling everything under the startup tab. My goal was to get rid of the useless utilities that run automatically on startup. (Most of these show up in the lower-right section of the Windows task bar.) Of course, I wasn't sure what else I would disable, so I immediately restarted the laptop to see.
The operation succeeded nicely. In fact, this is the step that made the single biggest difference for me. It didn't affect my wireless, USB functionality, etc. But it did get rid of a bunch of random Toshiba utilities, like the annoying webcam program on the left side of the screen. Most of these things are not worth the processor, memory, and battery life they cost to keep running. I'm sure I'll never miss them.
On the other hand, some of the utilities are probably worth keeping on, like the hard drive shock protection utility (anybody know if that really works?) and maybe the zooming and scrolling functions for the trackpad. The nice thing is that you can always go back and re-enable the utility scripts one at a time. Turning them off through msconfig doesn't remove them from the system. It just leaves them dormant until you want them.
4. Delete shortcuts from the desktop
This is purely aesthetic. If I want my desktop cluttered up with stuff, I'll put it there myself. Toshiba Assist can stay in the start menu where it belongs.
5. Change appearance settings
Finally, I used the control panel to change the system appearance. I went with "optimize for performance" to disable all the silly graphical things Windows does to make computers obselete faster. I don't need shadows under my cursors, thank you. I did stick with the windows XP look -- windows "classic" was ugly to the point of dysfunctionality.
That's it. Took me about an hour. And the system is noticably snappier and more responsive. Next up: partitioning the hard drive and installing Linux.
Assorted links that were helpful for information-gathering:
http://www.itwriting.com/blog/2325-the-windows-netbook-experience-toshiba-nb300.html
http://laptopforums.toshiba.com/t5/General-Technology/Preinstalled-software-crapware/m-p/93091
http://consumerist.com/2007/12/toshiba-dont-delete-bloatware-if-you-know-whats-good-for-you.html
http://forum.notebookreview.com/toshiba/186602-best-way-rid-toshiba-crapware.html
http://laptoplogic.com/resources/38-pre-installed-programs-you-should-remove-from-your-new-laptop
Software for text mining, esp unsupervised document clustering
I haven't worked with any of these directly (unsupservised learning is a step removed from the stuff I do) but I figured the results of the search were worth passing on.
One option close to hand is WordStat on the computer in the [UM political science] bullpen. It supports clustering and is pretty easy to use.
Another option is Justin Grimmer's Galileo package. I don't know if he's made this publically available yet. Last I heard he was trying to patent and maybe market it. Grimmer is one of Gary King's students; he was on the market this alst year. One plus to using Grimmer's work is that he's published in polisci journals, so his methods already have good credibility within the field.
A third option: RapidMiner. I haven't used this, but it's free, well-documented, and fits the bill for what you're trying to do.
Like I said, I haven't worked with any of these directly. Anybody have good/bad experiences with this kind of software?
Thursday, May 27, 2010
Where do web addresses come from?
I've been thinking about registering a domain name for my text-crunching software. It led me into a search for information on where URLs come from, where they are registered, and so on.
Here are some non-technical highlights from wikipedia's article on domain names.
HistoryOn 15 March 1985, the first commercial Internet domain name (.com) was registered under the name Symbolics.com by Symbolics Inc., a computer systems firm in Cambridge, Massachusetts.
By 1992 fewer than 15,000 dot.com domains were registered.
In December 2009 there were 192 million domain names. A big fraction of them are in the .com TLD, which as of March 15, 2010 had 84 million domain names, including 11.9 million online business and e-commerce sites, 4.3 million entertainment sites, 3.1 million finance related sites, and 1.8 million sports sites.
Domain Name RegistrationThe right to use a domain name is delegated by domain name registrars which are accredited by the Internet Corporation for Assigned Names and Numbers (ICANN), the organization charged with overseeing the name and number systems of the Internet. In addition to ICANN, each top-level domain (TLD) is maintained and serviced technically by an administrative organization operating a registry. A registry is responsible for maintaining the database of names registered within the TLD it administers. The registry receives registration information from each domain name registrar authorized to assign names in the corresponding TLD and publishes the information using a special service, the whois protocol.
Registries and registrars usually charge an annual fee for the service of delegating a domain name to a user and providing a default set of name servers. Often this transaction is termed a sale or lease of the domain name, and the registrant may sometimes be called an "owner", but no such legal relationship is actually associated with the transaction, only the exclusive right to use the domain name. More correctly, authorized users are known as "registrants" or as "domain holders".
...Domain names are often seen in analogy to real estate in that (1) domain names are foundations on which a website (like a house or commercial building) can be built and (2) the highest "quality" domain names, like sought-after real estate, tend to carry significant value, usually due to their online brand-building potential, use in advertising, search engine optimization, and many other criteria.
Apple passes Microsoft as the world's most valuable tech company
The Times story
"The rapidly rising value attached to Apple by investors also heralds an important cultural shift: Consumer tastes have overtaken the needs of business as the leading force shaping technology."
Friday, May 21, 2010
New netbook: Toshiba NB305
After doing a lot of shopping and reading, I went with Toshiba's NB305 netbook. 1.66MHz processor, 2GB RAM, full size keyboard, 10.1" screen, 3 USB ports, 11 hours of battery. About $350 with the RAM upgrade. Here's the link on amazon.
Thursday, May 20, 2010
Redesigned data.gov site to launch tomorrow
What could possibly make a lowly wonk happier than an enormous pile of government data for download?
Thanks to Matt for the tip.
Wednesday, May 19, 2010
@John RE: Is science running out of questions?
Some of the biggest questions are still unanswered, and they are perhaps bigger than anything we have toyed with since the conception of the scientific method. We've just gotten tired of asking a lot of them and have accepted our ignorance. For instance, what is life? We don't even know in what branch of science this question belongs: social, neuro, biochemical, quantum, or something that transcends all that we can observe. We don't even know where gravity comes from. What about the notorious three-body problem? But besides all of this, I think the size of the discovery is measured by how much you care. I discovered something in my lab earlier this year that made me run home and kiss my wife. It was a big deal to me. We'll see what the peer reviewers have to say about it.
(Video courtesy of youtube and somebody else's copy of Spore creature creator)
I agree -- "What is life?" is a great example of an unanswered question that you don't need to be an Einstein to ask.
Thinking about this a little more, I'm guessing that "What is life?" doesn't fit clean disciplinary boundaries because it will someday be a discipline of its own. It's easy to forget that all the branches of science we take for granted today were discovered at a specific place and a specific time. The modern science of chemistry grew out of the pseudo-science of alchemy, largely thanks to the invention/discovery of the periodic table. Were those rows and columns the only way we could have grouped atoms and elements? Are atoms the only way we could have made sense of matter?
Since these ways of thinking are drilled into us from 3rd-grade on, it is literally "hard to imagine" a different, but equally true framework. And because the periodic table works well for a lot of things, we've given up looking for alternatives. But that doesn't mean that no alternatives exist. See The unreasonable effectiveness of mathematics for a really interesting discussion of this idea.
I'm convinced that the distinctions between scientific fields are invented, not inevitable. The whole truth is a lot more cohesive than our understanding of it.
Tuesday, May 18, 2010
Is science running out of important questions?
Here's the gist of Sam's argument:
Science has already mapped most of the easily observed phenomena in the universe. We can explain why the sky is blue, why the sun rises in the east and sets in the west, why things fall, etc. Although we will never run out of questions to ask using the scientific method,the big ones are already taken. What's left is filling in details.My counterargument:
The scientific method is a hydra: every question we answer raises several more. In hindsight, the scientific accomplishments of the past look easy, but it was no mean feat for Leonardo to work out the details of portraying perspective and distance, or for Galileo to figure out that gravity is constant once density is taken into account. (We were sightseeing in Italy for this talk.)I'm still not fully persuaded either way. Both arguments seem at least partly true, and I haven't been able to restate them in terms where they can be adjudicated.
Not only are we not running out of questions, we have more important questions before us today than ever before. For example:
- Why does every language have a grammar (linguistics)?
- Why do wars happen (political science)?
- Why do recessions happen (economics)?
- Where does consciousness come from (psychology and neurobiology)?
- Here are 25 more from the NYTimes
Saturday, May 15, 2010
Covet: netbook
Anyway, I mostly use my laptop for checking email, writing, reading .pdfs, and programming. I'll need linux, and might like to have windows as well -- mainly to be able to read and edit .doc and .docx files. I'll need to do some serious computational processing, but I can almost always log into a remote cluster for the heavy lifting. Portability and battery life are very important.
So I'm thinking a netbook. Anyone have recommendations? Horror stories?
PS: If Moore's law holds, then the processing power of my laptop is about 1% of a laptop today. (See picture.)
Using rachota on multiple computers
The next problem was to figure out a way to use the program on any of several computers. In a given week, I put in substantial hours on my laptop, home desktop, campus computer labs, and the complex systems cluster. These machines run a smorgaspord of linux and windows, and I only have administrative rights to install software on some of them.
Here are the options I looked at:
1. dropbox: Two friends recommended this storage synchronization program to me. Problem: I would have to install it on every computer I use. This might have been workable, but I didn't want to go through the hassle of going through the department and university IT groups.
2. Deploy online: Rachota is java, right? So I should be able to embed it in a web page and run remotely. Sadly, I'm not much of an applet programmer, so I couldn't make this work.
3. sneakernet: Install Rachota on a USB drive and take it from place to place. This probably would work, and then end suddenly one day with me losing my USB keychain (and all my time tracking logs) in an Internet cafe.
4. ssh with X window: linux has the nifty capability to shell into a remote machine and run a program there, but display it on your working computer. Unfortunately, windows doesn't support it. It's also chews up bandwidth and is frustratingly slow over wireless. However, when I'm running linux with a broadband, ssh is a nice option.
5. Download-upload: A slightly clunk, but robust solution is to zip rachota and all its logs together, and put the archive in a hidden, downloadable folder. Whenever I start a session, I can download the archive, unzip it, and run the software. When finished, I run a python script (also included in the folder) that zips everything and uploads it to the original location. This approach is a little like the svn approach to version control, but doesn't require any prior installation, other than a zip application and python.
It looks like I'm going to be able to go with a mix of 4 and 5. Thanks for all the suggestions on this little bit of lifehacking!
Thursday, May 13, 2010
An update on time tracking...
http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?pagewanted=all
I definitely fall into the geeky category of people who achieve greater self awareness through quantification. I've saved every gas station receipt, with a notation for mileage, for the last 5 years. And Erin has tracked every credit card purchase we've made since the first year we were married.
Wednesday, May 12, 2010
Software for time management
But even taking that all into account, I put in a lot of hours and often have a very poor sense of where my time is actually going. It's hard to know if I'm getting my priorities right, and how long to budget for repetitive tasks.
So I've decided that it's time to get empirical about time management. I spent a couple hours today looking for time management software.
I went in looking for four things:
1. Free
2. Easy to use
3. Platform independent -- I need to be able to switch between computers running linux and windows
4. Automatic task tracking. I wanted a utility that would track which applications I use and which web pages I visit, and use them to deduce what I'm working on. Timesheet does this.
Here's what I found so far. (I can't tell you exactly how long I spent on this, but it was a couple of hours at least.)
http://slimtimer.com/users/
Thinking I would like the flexibility of a hosted app, I registered with this site. After playing with the interface for 20 minutes, I was not impressed. The whole thing was very clunky and counterintuitive. Rejected.
http://en.wikipedia.org/wiki/
A good list, but focused more on business software. Maybe a little TMI, actually.
Rachota 2.2
This is the solution I'm going with for now. It's free, easy to use, and built as lightweight java application -- and therefore platform independent.
Rachota doesn't work automatically -- you have to tell it every time you switch tasks -- but I've decided that that's probably healthy for me. Making a mental note when I switch from one thing to the next will probably help me prioritize better.
Unsolved problem: Rachota stores your files in a handful of .dtd and .xml files. I can email them to myself when I switch machines, but that's a pain. Is there a slicker way I can manage the problem of multiple computers?