Another Bad Design

July 2nd, 2009

I use iTunes to manage the music on my iPhone. I went to add a bunch of music into my library, and a window popped up saying it was “Adding files.” There was a “stop” button, and a progress bar. This progress bar started off with a blue cursor on the left side, and that blue cursor gradually moved to the right. I presumed that when the blue cursor reached the right edge of the bar the work would be done.

I was wrong. Once it got to the right edge of the bar, the cursor just wrapped around and appeared at the left edge again. I understand the point of having some sort of animated graphic to indicate that work is being done, but why a progress bar? A progress bar should only be used if it shows the extent to which the work has been completed. To indicate … progress. A better design would be to have some kind of spinning wheels or gears, to make it clear that you have no idea how long you’re going to be waiting until you can listen to “I Wanna Sex You Up” by Color Me Badd. If, you know, you wanted to listen to that song.

On Genetics, Programming Languages, and God

June 26th, 2009

The genetic code is the programming language used to make living organisms.  Right now, biologists are attempting to ‘reverse engineer’ this genetic code, in order to determine how it works and to make changes to it. I (in my infinite wisdom) think they’re going about this the wrong way.

To understand why, we must first delve into the languages of computers. When humans program computers, they usually write in “high level languages”, which look like a series of mathematical formulas and instructions.  Here is an example of a simple function to compute the nth Fibonacci Number, written in a high level language called C:

//computes the nth fibonacci number
unsigned int fibonacci(unsigned int n)
{
	unsigned int term1 = 1;
	unsigned int term2 = 1;
        unsigned int result = term2;
 
	unsigned int i = 1;
	while (i < n)
	{
		result = term1 + term2;
		term1 = term2;
		term2 = result;
		i = i + 1;
	}
 
	return result;
}

The Computer cannot understand this program as it is written. It must be translated into instructions that make sense to the computer. This  translation process is called compiling. The compiled program will look different depending upon what computer it is compiled on, but all computers basically operate the same way: they can read from and write to memory locations, and they can add, multiply, divide, and subtract numbers.  Here is an English-language translation of what the compiled Fibonacci function might look like. Provided to make things more clear is a mapping of variable names to their memory locations:

n 0
term1 1
term2 2
result 3
i 4
# set the value in location 1 to 1
# set the value in location 2 to 1
# set the value in location 3 to the value stored in location 1
# set the value in location 4 to the 1
# if the value in location 4 is not less than the value in location 0, go to instruction 11
# set the value in location 3 to the result of the value in 1 plus the value in 2
# set the value in location 1 to the value in location 2
# set the value in location 2 to the value in location 3
# set the value in location 4 to the value in location 4 plus the value 1
# jump to instruction 5
# return the value in location 3

Each of the above instructions would be represented with a single word, which on a top-of-the-line modern computer is 64 bits of information. The human genetic code is also a sequence of simple instructions, interpreted not by a central processing unit, but by intracellular organisms.  Instead of dealing with changing values in memory locations, the genetic code’s instructions are for amino acids used to build proteins.  There are approximately 6 billion instructions in the human genome, making it roughly 10 million times as complex as our simple Fibonacci function.

Notice that the machine language program is  no where near as easy to understand as the C program. If I gave you a machine language program with 100 million lines of code, you’d have a heck of a time trying to figure out how it worked.  Poking around and changing single instructions, then determining what happens when you change those instructions probably wouldn’t be a good start. Unfortunately, as I understand things, that is exactly how biologists are proceeding.  They are stepping through the genetic code piece by piece,  using experiments to try and figure out what different sequences of instructions do.

There is, I think, a better approach. The key to understanding this approach is a simple fact about computer programming: It’s easier (and more fun) to write your own program than it is to read someone else’s program and figure out what it’s doing. Instead of trying to reverse engineer existing DNA code, which evolved over millions of years and is therefore probably extremely convoluted and hard to follow, we’d be better off trying to ‘write’ our own organisms.  Biologists could start by writing DNA ‘programs’ to code simple proteins.  After getting good at this, they could invent a ‘high level language’  for DNA programming (AKA GeneticC) which could be used to speed up the development progress. The next step would be to write a single cellular organism, and then more complicated organisms.

Interestingly enough, this process of building our own organisms could provide a lot of insight into the question of whether there is some being who designed us.  Once we have a better understanding of genetic programming,  we could look at the quality of the code that builds human beings. If it’s clean, neat, and straightforward,  that would be strong evidence of a creator at work. If it’s messy, garbled, hacked together, redundant and nonsensical in parts, we could conclude that the code evolved over time.  Either that, or god is a perl scripter.

What Does This Mean?

May 18th, 2009

Do you know what this means?

htons

No Cheating via Google! Click for the answer … Read the rest of this entry »

Removing SVN Bindings with Python

May 16th, 2009

I had an SVN repository on my website that I was using to store a personal project of mine.  I wanted to move the files into a different SVN repository on another website. Doing this required that I first remove all of the .SVN folders in my project. It’s got a bunch of folders, so going through manually would have been a big pain. Besides, as a self-respecting geek, I’d definitely prefer spending twice as long writing a script to automate a boring process over just doing it manually. My script, which is based upon this script here, generates a batch file that you run to clean the directories out. I would have done it directly from python, but I kept getting ‘this directory could not be found’ exceptions, probably becuase of the spaces in my folder names.

The script is below, for your edification. It only works on windows, but it could easily be modified to run on a UNIX based system. It is presented without any warranty, express or implied, not even that of suitability for a specific purpose.

import os
import sys
 
def findFileGenerator(rootDirectory, acceptanceFunction):
  for aCurrentDirectoryItem in [ os.path.join(rootDirectory, x) for x in os.listdir(rootDirectory) ]:
    if os.path.isdir(aCurrentDirectoryItem):
		if acceptanceFunction(aCurrentDirectoryItem):
			yield aCurrentDirectoryItem
		for aSubdirectoryItem in findFileGenerator(aCurrentDirectoryItem, acceptanceFunction):
			yield aSubdirectoryItem
 
def acceptSVNChildren(theFileOrDir):
	return theFileOrDir.find('.svn') != -1
 
def removeSVN(rootdir):
	"""recursively removes the SVN bindings from the directory specified in the argument"""
 
	for dude in findFileGenerator(rootdir,acceptSVNChildren):
		if os.path.isdir(dude):
			print "rmdir /S /Q", '"'+dude+'"'
 
def main():
	"""Removes the SVN bindings from the commandline dir specified in the argument.
	If no argument is specified, the root directory is used."""
 
	dirToClean = os.getcwd()
	if(len(sys.argv) == 2):
		dirToClean = os.path.join(os.getcwd(),sys.argv[1])
	elif (len(sys.argv) &gt; 2):
		print "Usage: remove_svn.py &lt;dirToClean&gt;"
 
	removeSVN(dirToClean)
 
if __name__ == "__main__" : main()

A Thought on Computation

April 23rd, 2009

I think a lot about P vs NP.  I think it’s  the most interesting problem ever, and my opinion on the subject is pretty well known.  Here’s my latest thought.

Computation is all about state changes; every computer ever built is a finite state machine of some kind.  For those unversed in Computer Science terminology, a finite state machine is an abstraction with a finite number of discrete states.  It starts out in a specific state, and when it receives input from an alphabet of discrete symbols, it changes from one state to another.  For example, a toggling light switch has two states: on and off. When you push the button, the toggle moves to ‘off’ if it was in the ‘on’ state, and it moves to the ‘on’ state if it was in the ‘off’ state.  More complex machines have more states, and more complicated rules for transitioning from state to state.

As I was saying, every computing device ever built is a finite state machine. Fingers change state from extended to contracted.  An abacus changes state when its beads are moved.  The Antikythera mechanism and Difference Engine changed state by movement of gears.  Modern electronic computers change state by modifying electronic signals.  People have proposed optical computers which operate based upon switching light sources on and off.  Note that these state changes are all discrete – gears could be said to move continuously, but they have an integer number of teeth, and therefore an integer number of possible discrete states. Electricity is the flow of electrons, and light consists of discrete photons.  Fingers are, well, fingers.

What about analog computers? Fans of slide rules might argue that they involve an infinite number of states as a result of their continuous nature.  Here I will make a conjecture:  just  as Planck proved that  energy occurs in discrete chunks, we will also come to find that time and space are discrete.  That slide rule may appear to move continuously, but I’m willing to bet that it’s actually moving in discrete jumps which are so tiny we cannot measure them. I am not alone in putting faith in this conjecture.

If computation involves state changes, a logical question is, “For a given problem, how many state changes are necessary to solve it?”  If a problem requires at least ten state changes to solve, and time is composed of discrete units, then there is a lower bound on how quickly that problem could be solved. I realize there are a lot of basic questions that would have to be answered for this theory to prove useful, but my guess is that we will eventually discover that the “minimum number of states” problem is akin to the problem of finding the Kolmogorov complexity of a string; that is to say, it is impossible to devise a general purpose algorithm that always gives the correct answer. (In fact, I think I’m going to try to prove this result myself.)

What does all of this mean for P vs NP?  Even if the ‘minimum number of states’ problem is not solvable by a general purpose algorithm, perhaps someone can construct a proof of the minimum number of state changes for a particular problem.  If someone could do that for an NP-Complete problem, then they could finally determine the answer to P vs NP.  I’m accepting bets that P = NP. Any takers?

On Creationism

April 3rd, 2009

I know this isn’t exactly a popular opinion these days, but I don’t see what the big deal is in teaching children that some scientists disagree with the idea that species evolve into other species over time.  It is a true statement.  As long as there remains one scientist who disagrees (and you can always find one guy who believes just about anything), it will be a true statement.

Here’s where I think the crux of the problem lies: Science does not deal with reality, but Scientists like to believe that it does. Science cannot describe reality because nothing can. Scientists can only construct mathematical models of reality, and test those models in their predictive capabtility by running experiments.  Why can’t science tell us about realitiy? Simply put, there’s no experiment anyone can perform which would tell us that we are not living in a video game, a movie, a book, or some child’s dream.

Evolution is a scientific theory because it makes predictions ranging from the nature of the fossils to the behavior of bacteria.  Evolution, as a theory, is an abstract model of reality that happens to make very accurate and useful predictions.  Creationism, on the other hand, makes no predictions.  It is a theory dealing purely with metaphysical truth, with the nature of the reality in which we live. This difference doesn’t imply that Creationism is somehow  objectively worse than Evolution;  you simply can’t compare the two.  One is a set of statements that attempts to describe an objective reality, and the other is a mathematical model used to make predictions about the nature of the sensations we experience. They both have their places.

My main point here is that Creationist theories cant’ be ‘wrong’ becuase they can never be tested.  Evolutionary theories, however, can be tested, and they have continually passed those tests.  That doesn’t mean evolution is ‘true’ because science can never prove anything true.  If the scientific and educated american communities would stop insisting that Science is Truth, perhaps they’d find less opposition from more faith-oriented americans.

Confessions of a Programmer

February 13th, 2009

I can’t talk about my job too much because everything we do is TOP SECRET, but I thought I’d make a few confessions / thoughts about programming.

  1. I absolutely love writing code. Probably moreso than is healthy.
  2. I spend more time than I should formatting my code so it looks pretty. I really enjoy reading code with syntax highlighting, lots of comments, and good indenting.
  3. I dislike the ternary  "<expression> ? <true-value> : <false-value>" operator. I think it obfuscates code and makes debugging more difficult. That said, I’m guilty of using it myself because I think it looks cool.
  4. I used to consider myself a ‘very good’ programmer, but I’ve learned a lot at my new job, and the more I learn, the more I realize there is to learn. Programming is like knowledge in general that way. I think this phenomenon (beginning programmers overrating their skills) is a great example of the Dunning-Kruger effect at work.
  5. I get a huge rush when I write a lot of code and it works on the first try, especially if the code involved pointer arithmetic or a nontrivial algorithm.
  6. I think programming is a great way to gain self confidence. There’s nothing like completing a project that sounded terrifying when it was first proposed to you.
  7. Python used to be, far and away, my favorite language, but I’m starting to think C# is really cool, largely because I like the .NET libraries and the Reflection capabilities.   If I do go back to using Python on a regular basis, I’ll probably switch to IronPython so I can still use . NET.   I still think Python is better for manipulating strings, largely because of the splicing syntax, but I rarely have to do that. I’ve never extensively used the pickle module in Python, but from what I remember, it was more difficult to use than .NET’s XML Serialization. Admittedly, the XML serialization syntax in .NET is kind of goofy for serializing recursively defined classes like Trees, and it can’t handle a class like a Graph without some carefull planning, but it really is cool to add a few attributes, feed your obejct to a BinaryWriter, and have the data easily persisted to the hard drive or sent across the wire.
  8. I hate the const keyword in C++ but I think it’s necessary for writing good code. The reason I dislike it is that it spreads like a virus: you add the const keyword to one function, and soon you have to add const to all the inputs to that function and then add const to the output, and const soon  your whole  const class, all of its const clients, and some other random const code you wrote is peppered with const keywords.  I like C#’s implementation of the out keyword much, much more.
  9. That is all for now. I have more code to write :)

A Random Fact About Yourself

January 26th, 2009

I was tagged in a note on facebook, titled ‘25 random facts about myself’. You’re supposed to write “25 random things, facts, habits, or goals about you”, and tag 25 people, as well as the author of the note.  What does it mean to write a random fact about yourself?

Let’s figure out how many facts I could choose from. To simplify things,  let’s suppose my life occurs in one-second ‘frames’, and that my life is constant for the duration of that frame. In other words, lets assume time is divided into fixed-length quanta. As of the time of this posting, I am a little over 23.5 years old. That’s approximately 700 million seconds.   If a ’story’ about myself is a sequence of quanta, there are 2700 = 5.3 x 10210 possible stories about myself I could tell. For comparison, there are about calculated to be 1080 atoms in the universe. Even if I increase the size of the quantum to an hour, and limit stories to be descriptions of no more than 20 quanta, we have 7x1078 possible stories. That’s still a lot of stories.

Interestingly enough, if I used python’s random number generator,  the Mersenne Twister algorithm, I could actually implement a program to randomly select one of those stories, because it has a period of  219937-1. I thought about generating a truly random story, using a sweet bit of python code, but it probably wouldn’t be too interesting. It’d be really difficult to write anything about what I was doing at 4:00 on March 3, 1994, other than ‘I was 8 years old.’

On GPS Navigation Unit Mapping Algorithms

January 21st, 2009

I currently live in Chapel Hill, North Carolina. I grew up, however,  in Cincinnati, Ohio and a couple of times a year I make the drive back there.  The best route takes me through West Virginia on I-77 to Charleston, but after that it’s really up in the air as to what the best way to get to Cincinnati is. I’ve experimented with a number of routes, and after consulting my GPS unit, Google Maps, and AAA, I think I’ve come up with the best possible route.  For those interested, it is OH-32 to US-35 to US-34 to I-64 to I-77 to I-74 to US-52 to US-421 to I-40 to NC-54.

Why was I able to find a better route than my GPS unit, into which probably went thousands of man-hours of labor? My guess is that the algorithm used by the GPS unit is not theoretically optimal.  The reason for this is simple: the country is  huge.   There are about 6 million miles of paved highway in the US. Assume we have 10 intersections for every paved mile of road (a very conservation assumption): then we have 60 million intersections. A graph with 60 million vertices has at least 60 million edges, but probably closer to 120 million if it’s connected at all (60 million edges would just be a straight line, which is obviously wrong.)  A very good shortest path algorithm running on a graph with E edges and V vertices takes runs in time roughly proportional to E * log V.  Plugging in 120 million for E and 60 million for V, we get a number of computational operations on the order of 2 billion.
I have a Garmin c340, but I couldn’t find data on its microprocessor. Therefore well assume we are using a Tom-Tom 720, which has a 400 Mhz microprocessor inside. This means it undergoes one clock tick approximately (1 MHz = 1048576 Hz) 400 million times a second.  Each operation probably takes on the order of 100 machine instructions. If we assume perfect pipelining in the microprocessor and 0 cache misses ever,  we get 1 instruction per tick. That’s 200 billion instructions, at 400 Million instructions per second, for 500 seconds processing time on our GPS unit . 8 minutes would be a long time to wait for a route, and that’s with a number of conservative assumptions.  Once you take memory latency into effect, you’re probably looking at 16-20 minutes to find a route.

Your little GPS unit inside your probably employs a heuristic that favors taking interstates over back roads, among others.  It almost definitely also uses the fact that the graph of the roads embedded on the surface of a sphere.  It sure would be interesting to figure out what kind of algorithms they use inside that little thing.  Google Maps is an entirely different story- they have much faster processors and can probably find paths in parallel. What sort of parallel algorithms for finding shortest paths exist? Could you cache frequently used route portions? How would that work ? Just something to think about.

On Martin Luther King Jr.

January 19th, 2009

Today is a Market Holiday, so I don’t have work. For those keeping score at home, I started working at a financial company this fall. It’s called Blue Capital, and it’s a really amazing place to work. They routinely face all kinds of interesting computer science problems that require solutions which are both elegant in theory and efficient in practice. I’m really happy there.

Anyways, back to the task at hand.  Today is Martin Luther King Jr. Day. MLK Jr. was a great civil rights leader who opened the eyes of his fellow countrymen to the injustices suffered by black Americans, particularly in the south.  Was he a great man? I think, undeniably the answer to this question is ‘yes’. However, there is a more interesting question – was he a good man? The answer to that question, I think, is ‘ probably not’.

Unfortunately, I don’t have sources for a lot of this information, which is why I prefaced my answer with ‘probably.’    Firstly, he is alleged by numerous sources to have cheated on his wife. The only real web sources I could find for this point (that weren’t from hate sites) were about.com, and the straight dope, two sites I believe to be (relatively) bias free, at least when it comes to overt racism. A lot of the allegations come from illegal FBI wiretapping of King, and can’t really be trusted, but some come from King’s lifelong friend Ralph Abernathy.

Secondly, King plagiarized significant portions of his doctoral thesis, academic papers, and speeches.

Why does any of this matter? What’s the point of dragging a revered historical figure through the mud?  My reasoning is that I think there’s an important distinction to be made between good men and great men. Great men are nice and well and all; they do change the world significantly, but great men also tend to suffer from narcissism and selfishness. Consider Albert Einstein – brilliant man who undeniably advanced the state of science, but also a jerk who cheated on his wife.

Why the fixation on adultery? Am I some kind of right-wing christian conservative? No, not hardly. A marriage is not just a simple contract between two people made for financial considerations – it’s a lifelong commitment with profound implications for children.  Cheating on your marriage partner is probably one of the worst things you can do to them. It’s an incredibly self-centered thing to do. Not only does it cause significant emotional harm to your spouse, it causes irreparable harm to any children in the marriage.  It is well documented that children do far better in stable family structures with two parents around.  (Personally I don’t think it matters what gender the parents are, as long as they love and are committed to each other, but that’s an entirely different argument.)   A little over two thirds of all black children born in America today are born to unwed mothers.  Those children do not have the same chance at success as children born into stable, two-parent families centered around a committed relationship between two adults. Any man who would cheat on his wife, betraying that bond of trust and hurting the future of his children, is not a good role model.

Is there institutionalized racism in the world today? Absolutely. Did Martin Luther King Junior do a lot of work to fight this racism?  Of course; That’s why he was a great man. However, there’s a solid difference between great men and good men. Great men do extraordinary things with their lives.  They’re good for the world. What we need, however, more than great men, is good men, men who do ordinary things extraordinarily well. Men who get up every morning to go to work, who pay their taxes and give back to their communities in the small ways that they can, and who set excellent examples for their children. Given the choice between being a great man with a flawed personal life, and a good man who never accomplishes anything ‘of note’ in the world at large, I will choose to be a good man any time. I should hope you feel the same way.