Monday, May 31, 2010

Free online image converter

Free online image converter
http://www.coolutils.com/Online/Image-Converter/

Used this to add a logo to Rachota on my desktop. Painless.

Friday, May 28, 2010

Removing bloatware from my Toshiba 305 netbook

Today I decided I wanted to do something (anything!) besides AJAX-wrestling in the underground complex systems lab. So I broke out my netbook and worked on removing all the bloatware that came with it.

I worried that it would take a long time, because Toshiba has a reputation for bundling a lot of useless software. In the end, it wasn't too bad. Here are the steps I followed.

1. Apply pcdecrapifier to remove trial software and demos

PC decrapifier is free for personal use. It was very easy to download and install. It doesn't do anything you couldn't do on your own, but it streamlines the process a little bit.

I used it to get rid of trial versions, demos, and software I just plain didn't want: the Office 2007 trial, Toshiba's online backup, Bejeweled, etc. There were a few programs I wasn't sure if I wanted (e.g. windows live, Adobe AIR), so I kept them for now. Easy enough to purge later, and I'm not hurting for hard drive space at this point.

2. Use "Remove programs" to get rid of holdouts
PCDecrapifier wasn't able to get rid of everything, so I used "Remove Programs" in the windows control panel to get rid of the rest. Norton's demo was especially stubborn. Boo Norton.

3. Turn off startup utilities using msconfig
Acting on a suggestion from this forum, I ran msconfig from the ol' DOS prompt. I took the draconian action of disabling everything under the startup tab. My goal was to get rid of the useless utilities that run automatically on startup. (Most of these show up in the lower-right section of the Windows task bar.) Of course, I wasn't sure what else I would disable, so I immediately restarted the laptop to see.

The operation succeeded nicely. In fact, this is the step that made the single biggest difference for me. It didn't affect my wireless, USB functionality, etc. But it did get rid of a bunch of random Toshiba utilities, like the annoying webcam program on the left side of the screen. Most of these things are not worth the processor, memory, and battery life they cost to keep running. I'm sure I'll never miss them.

On the other hand, some of the utilities are probably worth keeping on, like the hard drive shock protection utility (anybody know if that really works?) and maybe the zooming and scrolling functions for the trackpad. The nice thing is that you can always go back and re-enable the utility scripts one at a time. Turning them off through msconfig doesn't remove them from the system. It just leaves them dormant until you want them.

4. Delete shortcuts from the desktop
This is purely aesthetic. If I want my desktop cluttered up with stuff, I'll put it there myself. Toshiba Assist can stay in the start menu where it belongs.

5. Change appearance settings
Finally, I used the control panel to change the system appearance. I went with "optimize for performance" to disable all the silly graphical things Windows does to make computers obselete faster. I don't need shadows under my cursors, thank you. I did stick with the windows XP look -- windows "classic" was ugly to the point of dysfunctionality.

That's it. Took me about an hour. And the system is noticably snappier and more responsive. Next up: partitioning the hard drive and installing Linux.


Assorted links that were helpful for information-gathering:
http://www.itwriting.com/blog/2325-the-windows-netbook-experience-toshiba-nb300.html

http://laptopforums.toshiba.com/t5/General-Technology/Preinstalled-software-crapware/m-p/93091

http://consumerist.com/2007/12/toshiba-dont-delete-bloatware-if-you-know-whats-good-for-you.html

http://forum.notebookreview.com/toshiba/186602-best-way-rid-toshiba-crapware.html

http://laptoplogic.com/resources/38-pre-installed-programs-you-should-remove-from-your-new-laptop

Software for text mining, esp unsupervised document clustering

A friend in the department recently asked me about software for text mining. Among other things, she was looking for programs that do "unsupervised document clustering." I went through my notes and did some web searching and came up with some promising options.

I haven't worked with any of these directly (unsupservised learning is a step removed from the stuff I do) but I figured the results of the search were worth passing on.

One option close to hand is WordStat on the computer in the [UM political science] bullpen. It supports clustering and is pretty easy to use.

Another option is Justin Grimmer's Galileo package. I don't know if he's made this publically available yet. Last I heard he was trying to patent and maybe market it. Grimmer is one of Gary King's students; he was on the market this alst year. One plus to using Grimmer's work is that he's published in polisci journals, so his methods already have good credibility within the field.

A third option: RapidMiner. I haven't used this, but it's free, well-documented, and fits the bill for what you're trying to do.

Like I said, I haven't worked with any of these directly. Anybody have good/bad experiences with this kind of software?

Thursday, May 27, 2010

Where do web addresses come from?

The system for divvying up web addresses is an interesting one. How do you decide who owns a word? How do you enforce ownership? Web addresses are an interesting case study in the emergence of property rights.

I've been thinking about registering a domain name for my text-crunching software. It led me into a search for information on where URLs come from, where they are registered, and so on.

Here are some non-technical highlights from wikipedia's article on domain names.

History

On 15 March 1985, the first commercial Internet domain name (.com) was registered under the name Symbolics.com by Symbolics Inc., a computer systems firm in Cambridge, Massachusetts.

By 1992 fewer than 15,000 dot.com domains were registered.

In December 2009 there were 192 million domain names. A big fraction of them are in the .com TLD, which as of March 15, 2010 had 84 million domain names, including 11.9 million online business and e-commerce sites, 4.3 million entertainment sites, 3.1 million finance related sites, and 1.8 million sports sites.

Domain Name Registration

The right to use a domain name is delegated by domain name registrars which are accredited by the Internet Corporation for Assigned Names and Numbers (ICANN), the organization charged with overseeing the name and number systems of the Internet. In addition to ICANN, each top-level domain (TLD) is maintained and serviced technically by an administrative organization operating a registry. A registry is responsible for maintaining the database of names registered within the TLD it administers. The registry receives registration information from each domain name registrar authorized to assign names in the corresponding TLD and publishes the information using a special service, the whois protocol.

Registries and registrars usually charge an annual fee for the service of delegating a domain name to a user and providing a default set of name servers. Often this transaction is termed a sale or lease of the domain name, and the registrant may sometimes be called an "owner", but no such legal relationship is actually associated with the transaction, only the exclusive right to use the domain name. More correctly, authorized users are known as "registrants" or as "domain holders".

...

Domain names are often seen in analogy to real estate in that (1) domain names are foundations on which a website (like a house or commercial building) can be built and (2) the highest "quality" domain names, like sought-after real estate, tend to carry significant value, usually due to their online brand-building potential, use in advertising, search engine optimization, and many other criteria.

Apple passes Microsoft as the world's most valuable tech company

A nicely designed infographic from NYTimes.

The Times story
"The rapidly rising value attached to Apple by investors also heralds an important cultural shift: Consumer tastes have overtaken the needs of business as the leading force shaping technology."

Friday, May 21, 2010

New netbook: Toshiba NB305

I just got my new netbook out of the box. It arrived on Wednesday, but I told myself I wouldn't open it until I'd finished a bunch of coding for my dissertation. The coding's done and now the box is open!

After doing a lot of shopping and reading, I went with Toshiba's NB305 netbook. 1.66MHz processor, 2GB RAM, full size keyboard, 10.1" screen, 3 USB ports, 11 hours of battery. About $350 with the RAM upgrade. Here's the link on amazon.

The netbook worked right out of the box. Less than 10 minutes to go through Windows' registration ("No. No. Remind me later. Disable.") and get connected to my home wireless. The new keyboard took another 5 minutes to get used to, and I'm still figuring out the two-touch trackpad. So far, so good. The machine gets high marks for plug-and-play web browsing.

One snag: I want to partition the hard drive and install Linux. Before doing that, I want to create a boot disk to restore the original OS if something goes wrong. Unfortunately, Windows wants ~10GB of space for the disk image. The only storage device I have with that much space is an external hard drive, and I already have a lot of other data stored on it. There's enough space on the hard drive, but I don't want to set aside the whole thing as a boot disk.

Is there a way to use only part of an external hard drive as a boot disk?

Thursday, May 20, 2010

Redesigned data.gov site to launch tomorrow

Story is here: http://www.wired.com/epicenter/2010/05/sneak-peek-the-obama-administrations-redesigned-datagov/

What could possibly make a lowly wonk happier than an enormous pile of government data for download?

Thanks to Matt for the tip.

Wednesday, May 19, 2010

@John RE: Is science running out of questions?

John commented on my post yesterday:

Some of the biggest questions are still unanswered, and they are perhaps bigger than anything we have toyed with since the conception of the scientific method. We've just gotten tired of asking a lot of them and have accepted our ignorance. For instance, what is life? We don't even know in what branch of science this question belongs: social, neuro, biochemical, quantum, or something that transcends all that we can observe. We don't even know where gravity comes from. What about the notorious three-body problem? But besides all of this, I think the size of the discovery is measured by how much you care. I discovered something in my lab earlier this year that made me run home and kiss my wife. It was a big deal to me. We'll see what the peer reviewers have to say about it.




(Video courtesy of youtube and somebody else's copy of Spore creature creator)

I agree -- "What is life?" is a great example of an unanswered question that you don't need to be an Einstein to ask.

Thinking about this a little more, I'm guessing that "What is life?" doesn't fit clean disciplinary boundaries because it will someday be a discipline of its own. It's easy to forget that all the branches of science we take for granted today were discovered at a specific place and a specific time. The modern science of chemistry grew out of the pseudo-science of alchemy, largely thanks to the invention/discovery of the periodic table. Were those rows and columns the only way we could have grouped atoms and elements? Are atoms the only way we could have made sense of matter?

Since these ways of thinking are drilled into us from 3rd-grade on, it is literally "hard to imagine" a different, but equally true framework. And because the periodic table works well for a lot of things, we've given up looking for alternatives. But that doesn't mean that no alternatives exist. See The unreasonable effectiveness of mathematics for a really interesting discussion of this idea.

I'm convinced that the distinctions between scientific fields are invented, not inevitable. The whole truth is a lot more cohesive than our understanding of it.

Tuesday, May 18, 2010

Is science running out of important questions?

On vacation last week, over a car trip between Sienna and Florence, my brother and I had a great discussion/debate: is science running out of important questions to ask?

Here's the gist of Sam's argument:
Science has already mapped most of the easily observed phenomena in the universe. We can explain why the sky is blue, why the sun rises in the east and sets in the west, why things fall, etc. Although we will never run out of questions to ask using the scientific method,the big ones are already taken. What's left is filling in details.
My counterargument:
The scientific method is a hydra: every question we answer raises several more. In hindsight, the scientific accomplishments of the past look easy, but it was no mean feat for Leonardo to work out the details of portraying perspective and distance, or for Galileo to figure out that gravity is constant once density is taken into account. (We were sightseeing in Italy for this talk.)

Not only are we not running out of questions, we have more important questions before us today than ever before. For example:
  • Why does every language have a grammar (linguistics)?
  • Why do wars happen (political science)?
  • Why do recessions happen (economics)?
  • Where does consciousness come from (psychology and neurobiology)?
  • Here are 25 more from the NYTimes
I'm still not fully persuaded either way. Both arguments seem at least partly true, and I haven't been able to restate them in terms where they can be adjudicated.

Saturday, May 15, 2010

Covet: netbook

I've decided to get a new laptop/netbook in the next couple weeks. My current machine, a Dell Inspiron 600m, is about to celebrate its 10th birthday. When I got it, wifi was a big deal. It's held up remarkably well, especially once I switched to ubuntu. But when your system can't handle pandora and google docs at the same time, it's time to move on.

Anyway, I mostly use my laptop for checking email, writing, reading .pdfs, and programming. I'll need linux, and might like to have windows as well -- mainly to be able to read and edit .doc and .docx files. I'll need to do some serious computational processing, but I can almost always log into a remote cluster for the heavy lifting. Portability and battery life are very important.

So I'm thinking a netbook. Anyone have recommendations? Horror stories?

PS: If Moore's law holds, then the processing power of my laptop is about 1% of a laptop today. (See picture.)

Using rachota on multiple computers

Earlier this week, I decided to start tracking my work hours to get a better sense of where my time goes. After playing around with a bunch of options, I settled on Rachota, an open-source time-tracking program written in java. So far so good.

The next problem was to figure out a way to use the program on any of several computers. In a given week, I put in substantial hours on my laptop, home desktop, campus computer labs, and the complex systems cluster. These machines run a smorgaspord of linux and windows, and I only have administrative rights to install software on some of them.

Here are the options I looked at:
1. dropbox: Two friends recommended this storage synchronization program to me. Problem: I would have to install it on every computer I use. This might have been workable, but I didn't want to go through the hassle of going through the department and university IT groups.

2. Deploy online: Rachota is java, right? So I should be able to embed it in a web page and run remotely. Sadly, I'm not much of an applet programmer, so I couldn't make this work.

3. sneakernet: Install Rachota on a USB drive and take it from place to place. This probably would work, and then end suddenly one day with me losing my USB keychain (and all my time tracking logs) in an Internet cafe.

4. ssh with X window: linux has the nifty capability to shell into a remote machine and run a program there, but display it on your working computer. Unfortunately, windows doesn't support it. It's also chews up bandwidth and is frustratingly slow over wireless. However, when I'm running linux with a broadband, ssh is a nice option.

5. Download-upload: A slightly clunk, but robust solution is to zip rachota and all its logs together, and put the archive in a hidden, downloadable folder. Whenever I start a session, I can download the archive, unzip it, and run the software. When finished, I run a python script (also included in the folder) that zips everything and uploads it to the original location. This approach is a little like the svn approach to version control, but doesn't require any prior installation, other than a zip application and python.

It looks like I'm going to be able to go with a mix of 4 and 5. Thanks for all the suggestions on this little bit of lifehacking!

Thursday, May 13, 2010

An update on time tracking...

An interesting read on the data-driven life. Hat tip to Carl for the link.

http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?pagewanted=all

I definitely fall into the geeky category of people who achieve greater self awareness through quantification. I've saved every gas station receipt, with a notation for mileage, for the last 5 years. And Erin has tracked every credit card purchase we've made since the first year we were married.

Wednesday, May 12, 2010

Software for time management

I find that it's very hard to budget my time as a grad student. Partly, this is the nature of research -- every project should be non-routine (and therefore a little bit unpredictable) in some respect. Partly it's the lack of direct supervision, formal structure, or near-term deadlines.

But even taking that all into account, I put in a lot of hours and often have a very poor sense of where my time is actually going. It's hard to know if I'm getting my priorities right, and how long to budget for repetitive tasks.

So I've decided that it's time to get empirical about time management. I spent a couple hours today looking for time management software.

I went in looking for four things:
1. Free
2. Easy to use
3. Platform independent -- I need to be able to switch between computers running linux and windows
4. Automatic task tracking. I wanted a utility that would track which applications I use and which web pages I visit, and use them to deduce what I'm working on. Timesheet does this.

Here's what I found so far. (I can't tell you exactly how long I spent on this, but it was a couple of hours at least.)

http://slimtimer.com/users/home
Thinking I would like the flexibility of a hosted app, I registered with this site. After playing with the interface for 20 minutes, I was not impressed. The whole thing was very clunky and counterintuitive. Rejected.

http://en.wikipedia.org/wiki/Comparison_of_time_tracking_software
A good list, but focused more on business software. Maybe a little TMI, actually.

Rachota 2.2
This is the solution I'm going with for now. It's free, easy to use, and built as lightweight java application -- and therefore platform independent.

Rachota doesn't work automatically -- you have to tell it every time you switch tasks -- but I've decided that that's probably healthy for me. Making a mental note when I switch from one thing to the next will probably help me prioritize better.

Unsolved problem: Rachota stores your files in a handful of .dtd and .xml files. I can email them to myself when I switch machines, but that's a pain. Is there a slicker way I can manage the problem of multiple computers?