Here is a ruby script I quickly whipped up to teach myself genetic programming/evolutionary programming, and also to demonstrate basic principles of biology (please see below to see how it works.. if you’re on my front page, click to read more first). I think I’m going to write something more sophisticated in erlang (I don’t know erlang, but I’m intrigued) after I get further down the road in gp, but here’s what I quickly whipped up. Since writing it, I have started reading A Field Guide to Genetic Programming and already notice some things I should have done a little different. But in some respects, having a simple string of characters change their role in different modes is elegant and I think I prefer that way of doing things, despite shortfalls in statistical bias. Here are some observations I have at this point:
I know the people researching gp are really smart, so there’s probably something I’m missing, but I don’t see how genetic programming is supposed to quickly find solutions. Evolution is as slow as molasses. You need to dig through millions of years of dirt to see actual changes. I can only see gp quickly finding solutions if the process is being run continually, with multiple challenges or “resources”. I was glad to see, after having this thought, mention of multi-objective (5.1.3) and coevolution (5.4) in the book. It makes me think I’m on the right track. I also remember a while back reading in the article Discover had about Avida that when there was only one resource, one species would quickly capitalize on that and then the evolution would stagnate, but having multiple resources would create diversity and even better solutions at attacking that one resource. The ability of organisms to repurpose adaptations is important in evolution and without it, most complexity would not have been possible.
This leads me to another point. It seems gp works best when it works the most like evolution in the real world works. The rate of mutation is low and, despite my initial intuitions that a high mutation rate would lead to a good solution more quickly, I found that it was actually harmful to do so. Even though, in this script, there is a bias towards shorter programs (the longer the chromosomes, the more calories it takes to create just one offspring), when I had a higher mutation rate, like, say, 1 out of 5 copies, the length of the genes would get really, really long as a defense against mutations. This could happen because if the last parenthesis of the mini-lisp program is closed prematurely, the rest of the genes are ignored. Junk dna. In my program, at least, junk dna was a defense against mutation since I would get really short programs that have long tails of junk after them, but decreasing the mutation rate shrunk those tails down to almost insignificant. Weird. Please try downloading this and edit it to see for yourself.
While mutations should be rare, it helps to have variety. I did just substitutions at first and it worked, but it took a while. I also added stutters, insertions and deletions. Once I added stutters, it helped a lot! Even before adding stutters, I found that, when the “challenge” was 6 * x, I would get x + x + x + x + x and something like 7 * x competing with each other for a long time (5x or 7x often becomes the leading candidate after a few generations and it takes forever for any 6x to show up). Adding the stutters seemed to get multiplication through addition happening faster, though.
Download yeasties or copy/paste from below (requires ruby and clisp):