Archive for the ‘Computer Science’ Category

On Genetics, Programming Languages, and God

Friday, June 26th, 2009

The genetic code is the programming language used to make living organisms.  Right now, biologists are attempting to ‘reverse engineer’ this genetic code, in order to determine how it works and to make changes to it. I (in my infinite wisdom) think they’re going about this the wrong way.

To understand why, we must first delve into the languages of computers. When humans program computers, they usually write in “high level languages”, which look like a series of mathematical formulas and instructions.  Here is an example of a simple function to compute the nth Fibonacci Number, written in a high level language called C:

//computes the nth fibonacci number
unsigned int fibonacci(unsigned int n)
{
	unsigned int term1 = 1;
	unsigned int term2 = 1;
        unsigned int result = term2;
 
	unsigned int i = 1;
	while (i < n)
	{
		result = term1 + term2;
		term1 = term2;
		term2 = result;
		i = i + 1;
	}
 
	return result;
}

The Computer cannot understand this program as it is written. It must be translated into instructions that make sense to the computer. This  translation process is called compiling. The compiled program will look different depending upon what computer it is compiled on, but all computers basically operate the same way: they can read from and write to memory locations, and they can add, multiply, divide, and subtract numbers.  Here is an English-language translation of what the compiled Fibonacci function might look like. Provided to make things more clear is a mapping of variable names to their memory locations:

n 0
term1 1
term2 2
result 3
i 4
# set the value in location 1 to 1
# set the value in location 2 to 1
# set the value in location 3 to the value stored in location 1
# set the value in location 4 to the 1
# if the value in location 4 is not less than the value in location 0, go to instruction 11
# set the value in location 3 to the result of the value in 1 plus the value in 2
# set the value in location 1 to the value in location 2
# set the value in location 2 to the value in location 3
# set the value in location 4 to the value in location 4 plus the value 1
# jump to instruction 5
# return the value in location 3

Each of the above instructions would be represented with a single word, which on a top-of-the-line modern computer is 64 bits of information. The human genetic code is also a sequence of simple instructions, interpreted not by a central processing unit, but by intracellular organisms.  Instead of dealing with changing values in memory locations, the genetic code’s instructions are for amino acids used to build proteins.  There are approximately 6 billion instructions in the human genome, making it roughly 10 million times as complex as our simple Fibonacci function.

Notice that the machine language program is  no where near as easy to understand as the C program. If I gave you a machine language program with 100 million lines of code, you’d have a heck of a time trying to figure out how it worked.  Poking around and changing single instructions, then determining what happens when you change those instructions probably wouldn’t be a good start. Unfortunately, as I understand things, that is exactly how biologists are proceeding.  They are stepping through the genetic code piece by piece,  using experiments to try and figure out what different sequences of instructions do.

There is, I think, a better approach. The key to understanding this approach is a simple fact about computer programming: It’s easier (and more fun) to write your own program than it is to read someone else’s program and figure out what it’s doing. Instead of trying to reverse engineer existing DNA code, which evolved over millions of years and is therefore probably extremely convoluted and hard to follow, we’d be better off trying to ‘write’ our own organisms.  Biologists could start by writing DNA ‘programs’ to code simple proteins.  After getting good at this, they could invent a ‘high level language’  for DNA programming (AKA GeneticC) which could be used to speed up the development progress. The next step would be to write a single cellular organism, and then more complicated organisms.

Interestingly enough, this process of building our own organisms could provide a lot of insight into the question of whether there is some being who designed us.  Once we have a better understanding of genetic programming,  we could look at the quality of the code that builds human beings. If it’s clean, neat, and straightforward,  that would be strong evidence of a creator at work. If it’s messy, garbled, hacked together, redundant and nonsensical in parts, we could conclude that the code evolved over time.  Either that, or god is a perl scripter.

A Thought on Computation

Thursday, April 23rd, 2009

I think a lot about P vs NP.  I think it’s  the most interesting problem ever, and my opinion on the subject is pretty well known.  Here’s my latest thought.

Computation is all about state changes; every computer ever built is a finite state machine of some kind.  For those unversed in Computer Science terminology, a finite state machine is an abstraction with a finite number of discrete states.  It starts out in a specific state, and when it receives input from an alphabet of discrete symbols, it changes from one state to another.  For example, a toggling light switch has two states: on and off. When you push the button, the toggle moves to ‘off’ if it was in the ‘on’ state, and it moves to the ‘on’ state if it was in the ‘off’ state.  More complex machines have more states, and more complicated rules for transitioning from state to state.

As I was saying, every computing device ever built is a finite state machine. Fingers change state from extended to contracted.  An abacus changes state when its beads are moved.  The Antikythera mechanism and Difference Engine changed state by movement of gears.  Modern electronic computers change state by modifying electronic signals.  People have proposed optical computers which operate based upon switching light sources on and off.  Note that these state changes are all discrete – gears could be said to move continuously, but they have an integer number of teeth, and therefore an integer number of possible discrete states. Electricity is the flow of electrons, and light consists of discrete photons.  Fingers are, well, fingers.

What about analog computers? Fans of slide rules might argue that they involve an infinite number of states as a result of their continuous nature.  Here I will make a conjecture:  just  as Planck proved that  energy occurs in discrete chunks, we will also come to find that time and space are discrete.  That slide rule may appear to move continuously, but I’m willing to bet that it’s actually moving in discrete jumps which are so tiny we cannot measure them. I am not alone in putting faith in this conjecture.

If computation involves state changes, a logical question is, “For a given problem, how many state changes are necessary to solve it?”  If a problem requires at least ten state changes to solve, and time is composed of discrete units, then there is a lower bound on how quickly that problem could be solved. I realize there are a lot of basic questions that would have to be answered for this theory to prove useful, but my guess is that we will eventually discover that the “minimum number of states” problem is akin to the problem of finding the Kolmogorov complexity of a string; that is to say, it is impossible to devise a general purpose algorithm that always gives the correct answer. (In fact, I think I’m going to try to prove this result myself.)

What does all of this mean for P vs NP?  Even if the ‘minimum number of states’ problem is not solvable by a general purpose algorithm, perhaps someone can construct a proof of the minimum number of state changes for a particular problem.  If someone could do that for an NP-Complete problem, then they could finally determine the answer to P vs NP.  I’m accepting bets that P = NP. Any takers?

Confessions of a Programmer

Friday, February 13th, 2009

I can’t talk about my job too much because everything we do is TOP SECRET, but I thought I’d make a few confessions / thoughts about programming.

  1. I absolutely love writing code. Probably moreso than is healthy.
  2. I spend more time than I should formatting my code so it looks pretty. I really enjoy reading code with syntax highlighting, lots of comments, and good indenting.
  3. I dislike the ternary  "<expression> ? <true-value> : <false-value>" operator. I think it obfuscates code and makes debugging more difficult. That said, I’m guilty of using it myself because I think it looks cool.
  4. I used to consider myself a ‘very good’ programmer, but I’ve learned a lot at my new job, and the more I learn, the more I realize there is to learn. Programming is like knowledge in general that way. I think this phenomenon (beginning programmers overrating their skills) is a great example of the Dunning-Kruger effect at work.
  5. I get a huge rush when I write a lot of code and it works on the first try, especially if the code involved pointer arithmetic or a nontrivial algorithm.
  6. I think programming is a great way to gain self confidence. There’s nothing like completing a project that sounded terrifying when it was first proposed to you.
  7. Python used to be, far and away, my favorite language, but I’m starting to think C# is really cool, largely because I like the .NET libraries and the Reflection capabilities.   If I do go back to using Python on a regular basis, I’ll probably switch to IronPython so I can still use . NET.   I still think Python is better for manipulating strings, largely because of the splicing syntax, but I rarely have to do that. I’ve never extensively used the pickle module in Python, but from what I remember, it was more difficult to use than .NET’s XML Serialization. Admittedly, the XML serialization syntax in .NET is kind of goofy for serializing recursively defined classes like Trees, and it can’t handle a class like a Graph without some carefull planning, but it really is cool to add a few attributes, feed your obejct to a BinaryWriter, and have the data easily persisted to the hard drive or sent across the wire.
  8. I hate the const keyword in C++ but I think it’s necessary for writing good code. The reason I dislike it is that it spreads like a virus: you add the const keyword to one function, and soon you have to add const to all the inputs to that function and then add const to the output, and const soon  your whole  const class, all of its const clients, and some other random const code you wrote is peppered with const keywords.  I like C#’s implementation of the out keyword much, much more.
  9. That is all for now. I have more code to write :)

A Random Fact About Yourself

Monday, January 26th, 2009

I was tagged in a note on facebook, titled ‘25 random facts about myself’. You’re supposed to write “25 random things, facts, habits, or goals about you”, and tag 25 people, as well as the author of the note.  What does it mean to write a random fact about yourself?

Let’s figure out how many facts I could choose from. To simplify things,  let’s suppose my life occurs in one-second ‘frames’, and that my life is constant for the duration of that frame. In other words, lets assume time is divided into fixed-length quanta. As of the time of this posting, I am a little over 23.5 years old. That’s approximately 700 million seconds.   If a ’story’ about myself is a sequence of quanta, there are 2700 = 5.3 x 10210 possible stories about myself I could tell. For comparison, there are about calculated to be 1080 atoms in the universe. Even if I increase the size of the quantum to an hour, and limit stories to be descriptions of no more than 20 quanta, we have 7x1078 possible stories. That’s still a lot of stories.

Interestingly enough, if I used python’s random number generator,  the Mersenne Twister algorithm, I could actually implement a program to randomly select one of those stories, because it has a period of  219937-1. I thought about generating a truly random story, using a sweet bit of python code, but it probably wouldn’t be too interesting. It’d be really difficult to write anything about what I was doing at 4:00 on March 3, 1994, other than ‘I was 8 years old.’

On GPS Navigation Unit Mapping Algorithms

Wednesday, January 21st, 2009

I currently live in Chapel Hill, North Carolina. I grew up, however,  in Cincinnati, Ohio and a couple of times a year I make the drive back there.  The best route takes me through West Virginia on I-77 to Charleston, but after that it’s really up in the air as to what the best way to get to Cincinnati is. I’ve experimented with a number of routes, and after consulting my GPS unit, Google Maps, and AAA, I think I’ve come up with the best possible route.  For those interested, it is OH-32 to US-35 to US-34 to I-64 to I-77 to I-74 to US-52 to US-421 to I-40 to NC-54.

Why was I able to find a better route than my GPS unit, into which probably went thousands of man-hours of labor? My guess is that the algorithm used by the GPS unit is not theoretically optimal.  The reason for this is simple: the country is  huge.   There are about 6 million miles of paved highway in the US. Assume we have 10 intersections for every paved mile of road (a very conservation assumption): then we have 60 million intersections. A graph with 60 million vertices has at least 60 million edges, but probably closer to 120 million if it’s connected at all (60 million edges would just be a straight line, which is obviously wrong.)  A very good shortest path algorithm running on a graph with E edges and V vertices takes runs in time roughly proportional to E * log V.  Plugging in 120 million for E and 60 million for V, we get a number of computational operations on the order of 2 billion.
I have a Garmin c340, but I couldn’t find data on its microprocessor. Therefore well assume we are using a Tom-Tom 720, which has a 400 Mhz microprocessor inside. This means it undergoes one clock tick approximately (1 MHz = 1048576 Hz) 400 million times a second.  Each operation probably takes on the order of 100 machine instructions. If we assume perfect pipelining in the microprocessor and 0 cache misses ever,  we get 1 instruction per tick. That’s 200 billion instructions, at 400 Million instructions per second, for 500 seconds processing time on our GPS unit . 8 minutes would be a long time to wait for a route, and that’s with a number of conservative assumptions.  Once you take memory latency into effect, you’re probably looking at 16-20 minutes to find a route.

Your little GPS unit inside your probably employs a heuristic that favors taking interstates over back roads, among others.  It almost definitely also uses the fact that the graph of the roads embedded on the surface of a sphere.  It sure would be interesting to figure out what kind of algorithms they use inside that little thing.  Google Maps is an entirely different story- they have much faster processors and can probably find paths in parallel. What sort of parallel algorithms for finding shortest paths exist? Could you cache frequently used route portions? How would that work ? Just something to think about.

Credo

Thursday, August 14th, 2008

I believe in myself.  I believe that the mind has incredible power over the body, and that eating breakfast every day is good for you.  I believe in small miracles. I believe in the power of free markets to improve the human condition. I believe that human history is a story of progress, and that the future will be better than the past. I believe that family is the single most important thing there is on this planet. I believe in a thing called love, but I don’t really know what it is or how it works. I believe that peace will  come to the world, albeit gradually.  I believe low pocket pairs are rarely worth the trouble they can cause you.  I believe that strawberry jelly is superior to grape jelly, especially vis a vis peanut butter and jelly sandwiches. I believe a first kiss should be as romantic as possible. I believe in following big dreams, no matter how likely you are to obtain them.  I believe in opening the door for others. I believe that not all cultures are ‘equal’ in any real sense, and that the values a culture holds effect that culture’s economies and social freedoms.  I believe in taking responsibility for my actions, and ensuring that others do the same. I believe a man should take his fate into his own hands, whether or not he has the ability to do so. I believe that life is 10% what you’ve been given, and 90% what you make of what you’ve been given.  I believe in preventative maintenance.  I do believe Jones Soda to be absolutely delicious.  I believe in making small differences, especially when they don’t seem to matter at all.  I believe that, somehow, we are all one. I believe that sometimes, the only solution to a problem is violence. I believe that game theory is one of the most underutilized mathematical constructs. I believe that P will eventually be found to encompass all of NP, but that NP-Complete problems will still be effectively intractable. I believe python to be an amazingly useful language. I believe that computer programs are beautiful in of themselves, regardless of what they do.  I believe that the scientific method has been phenomenally successful  in helping us divine the nature of our world. I believe organized religions like modern Christianity have spread good works all over the world. I believe Islam is fundamentally incompatible with a Free, Democratic society. I believe in challenging common wisdom: Can you really boil a frog to death by gradually increasing the heat? I sincerely doubt it. I believe that saying ‘please’ and ‘thank you’ make a small difference in someone’s day. I believe that small differences can make a big difference. I believe that voting is irrational. I believe it’s OK to be irrational every now and them.  I believe the world was created by a phenomenally intelligent entity, as an act of supreme love.  And in the end, I believe that everything will work itself out.

New License Plate

Wednesday, August 1st, 2007

One of the many things I had to take care of after moving to Chapel Hill, North Carolina was to register my car with the state. North Carolina lets you put an ‘equals’ sign on your license plate, so I couldn’t resist.

The tattoo comes next

My Loves

Monday, April 23rd, 2007

If you’ve spent any significant amount of time around me, you’ve probably heard me talk about two things I love: Joe Satriani, guitar virtuoso, and Python, the best programming language ever.  It looks like this guy agrees with me on the superiority of Python:

“…people don’t learn Python because it will get them a job; they learn it because they genuinely like to program and aren’t satisfied with the languages they already know.”

On The Speed of Light and Video Games

Wednesday, April 18th, 2007

When you’re making a computer game with interactive 3D graphics, you need to compute lighting if you want your game to look remotely realistic.  The methods for calculating light on a surface can be either quite simple or very complex.  Simple algorithms act as if light passes through all surfaces after illuminating them, while more complicated algorithms allow light to cast shadows. The ‘best’ form of commonly used lighting is raytracing, which, as the name implies, traces rays from a light source to their destintation. (Actually, raytracing algorithms start from a viewpoint and trace outwards, but that’s not really important.) Raytracing allows for the most realistic shadowing, but it’s the most computationally expensive.

The actual physics behind light aren’t all that complicated, as long as you’re only concerned with how bright objects look and you don’t care about what happens when light goes through tiny holes or passes through irregular media. (Answer: weird stuff.)   The reason programmers put a lot of work into lighting algorithms is that computing the effects of light on a scene can take a lot of time. Real-time lighting algorithms generally rely on tricks that allow programmers to ‘get away’ with not actually computing everything properly.

In my Philosophy of Time course, we recently discussed relativity and its implications as to the meaning of time. While sitting in class yesterday, I remembered an idea that I first had while taking modern physics several years ago.  All computer lighting algorithms of which I am aware treat the speed of light as infinite.  It’s not a bad approximation because from the perspective of your typical human being, it might as well.  Does playing an upperbound on the speed of light allow you to ‘get away’ with anything computationally?

I’m pretty sure the answer is yes.  The speed of light is also theorized to be the speed at which forces are transmitted. In other words, if the sun were to ‘dissappear’ right now, we wouldn’t see the sun dissappear until about 8 minutes after it happened.  The same is true of the sun’s gravitational pull – the earth would continue to move in a circular orbit until about 8 minutes after the sun explodes.

If you had a giant computer simulating the interaction between the earth and the sun, the interaction could be parellelized quite nicely, because any change at the sun won’t be able to immediately effect anything happening on the earth. If the force of gravity were transmitted instantaneously, however, as soon as the sun exploded you’d have to recalculate earth’s orbit.

I want to do more thinking on this subject, and lorenz contraction / expansion (the tendency of objects to change drastically in size as they approach the speed of light) and how it could possibly be used to speed up a computer simulation. If it turns out that lorenz contraction does allow you to compute things faster, it would be more evidence for me that the world is, in fact, a computer simulation.

On Boredom

Tuesday, April 17th, 2007

I’m working on a Lego Robot, using a Handyboard, for my AI class. It’s a lot of fun. My brothers had told me about the robot project they worked on their first year at OSU, but I had no idea how much fun it was. The current assignment is to get the robot to drive around on top of a table, looking for a light source, and avoiding falling off of the edge. In order to avoid having the robot get stuck in a corner, I implemented what can best be described as ‘boredom.’ Every time the robot decides to perform an action, if it’s a ‘boring’ action, the robot’s boredom counter is incremented. Other actions are ‘exciting’, causing the robot to decrement its boredom counter. The counter itself gradually ticks down over time. If it ever gets too high, the robot just stops whatever it was doing and does a complete about face, hoping to find something more exciting to do. Without the ‘boredom’ mechanism, the robot can get stuck in some kind of rut.

I had the idea several years ago that perhaps ‘boredom’ was essential to our survival as a species. Whenever I get bored, I go look for something to do. Sometimes, that leads me to something productive. If we human beings didn’t get bored, we might become content just to have enough resources to get by. My guess is that lots of important ideas were discovered by people who could easily afford to amuse themselves doing whatever, but would just get terribly bored.

My whole life has been a struggle to avoid being bored. I get bored very easily and don’t deal with it well. It’s interesting that a feeling that causes me such frustration could be essential to the survival of the species. I suppose that wouldn’t be the first time, though…

I gave a presentation on my comptuer science research today, and this crazy old guy came up and asked what this research was going to do to solve the humanitarian problems in the world. I told him that wasn’t the focus of my research, and we got into an argument about whether or not the Humanities were infinitely superior to the Sciences.  That’s a topic for another post, however.