Here is a ruby script I quickly whipped up to teach myself genetic programming/evolutionary programming, and also to demonstrate basic principles of biology (please see below to see how it works.. if you’re on my front page, click to read more first). I think I’m going to write something more sophisticated in erlang (I don’t know erlang, but I’m intrigued) after I get further down the road in gp, but here’s what I quickly whipped up. Since writing it, I have started reading A Field Guide to Genetic Programming and already notice some things I should have done a little different. But in some respects, having a simple string of characters change their role in different modes is elegant and I think I prefer that way of doing things, despite shortfalls in statistical bias. Here are some observations I have at this point:
I know the people researching gp are really smart, so there’s probably something I’m missing, but I don’t see how genetic programming is supposed to quickly find solutions. Evolution is as slow as molasses. You need to dig through millions of years of dirt to see actual changes. I can only see gp quickly finding solutions if the process is being run continually, with multiple challenges or “resources”. I was glad to see, after having this thought, mention of multi-objective (5.1.3) and coevolution (5.4) in the book. It makes me think I’m on the right track. I also remember a while back reading in the article Discover had about Avida that when there was only one resource, one species would quickly capitalize on that and then the evolution would stagnate, but having multiple resources would create diversity and even better solutions at attacking that one resource. The ability of organisms to repurpose adaptations is important in evolution and without it, most complexity would not have been possible.
This leads me to another point. It seems gp works best when it works the most like evolution in the real world works. The rate of mutation is low and, despite my initial intuitions that a high mutation rate would lead to a good solution more quickly, I found that it was actually harmful to do so. Even though, in this script, there is a bias towards shorter programs (the longer the chromosomes, the more calories it takes to create just one offspring), when I had a higher mutation rate, like, say, 1 out of 5 copies, the length of the genes would get really, really long as a defense against mutations. This could happen because if the last parenthesis of the mini-lisp program is closed prematurely, the rest of the genes are ignored. Junk dna. In my program, at least, junk dna was a defense against mutation since I would get really short programs that have long tails of junk after them, but decreasing the mutation rate shrunk those tails down to almost insignificant. Weird. Please try downloading this and edit it to see for yourself.
While mutations should be rare, it helps to have variety. I did just substitutions at first and it worked, but it took a while. I also added stutters, insertions and deletions. Once I added stutters, it helped a lot! Even before adding stutters, I found that, when the “challenge” was 6 * x, I would get x + x + x + x + x and something like 7 * x competing with each other for a long time (5x or 7x often becomes the leading candidate after a few generations and it takes forever for any 6x to show up). Adding the stutters seemed to get multiplication through addition happening faster, though.
Download yeasties or copy/paste from below (requires ruby and clisp):
How it works: (feel free to ignore italicized parenthetical comments if they confuse or distract you.. they can be safely ignored and are there for anyone wanting to play around with the program)
This is a basic, single-objective instance of digital evolution. Organisms only reproduce asexually. As you can see in the code, the “dna” is just a sequence of lower-case letters (feel free to modify that array for aesthetic purposes, but keep in mind that there should be at least 30 elements or so in the array for this to work right). Each organism has its own sequence of dna. For the first generation, it is generated randomly, for any generation after that, it is a copy, occasionally mutated, of its parent’s. This dna is used to generate the “protein” or mini-program that will be run. The mini-program is in in lisp, rather than ruby (it should be quite easy to change that to be generated ruby code and just use eval() on it, though lisp code is easier to generate and we have the advantage of being able to easily add ‘time’ to the command, and factoring in how computationally intensive it is into the equation. I started to set that up, but found that in this program, it’s almost always a split second and the differences have nothing to do with the code in question… just something to consider for larger programs in the future).
When reproducing, the mini-programs must expend calories to do so, in proportion to the length of the chromosomes. Simpler life is cheaper to create, but a) more susceptible to mutation (assuming longer chromosomes also mean more junk dna) and b) less likely to be the solution in cases where the challenge is complex. The amount of calories that is given to an “organism” to reproduce is proportional to its fitness. See next paragraph for how fitness is evaluated. This dna is then used to generate the “body” or “proteins” of the organism, which is simply a small program in lisp (basically two lines, the first simply being the line to feed the program a value). The generation of the program works like this: there are different modes, such as “operator is okay” or “variable or constant only”. Each gene is evaluated in sequence, starting with the operator mode. Depending on what is selected, the mode will change, so ‘a’ might be 1 in terminal mode or + in operator mode. This ensures that only legal programs are being made. If you understand lisp, this’ll make sense (I don’t really understand lisp, but I can fake it): once the last parenthesis is closed, the generation ends and the rest is essentially junk dna. Junk dna can protect against mutation (since a mutation is statistically less likely to affect working code) but it also means offspring are more expensive to produce without any additional working code..
Each mini-program runs with a parameter being fed to it and is evaluated based on what it spits back out. If it evolves to not have x in it, then it will be ignoring that parameter to its peril (unless we set the challenge to something that also doesn’t have x in it, like a constant number, in which case x is unnecessary noise though, at a 0 to 1 range, minor noise at that). For each generation, the x is set to the percentage of the minute (seconds divided by 60) or to that number times something else. This makes the program good for doing regression within a range since you can just multiply this number by the range and add the minimum point. For example, to do a sine function, do the sin of pi times this number (there’s no way in hell this script is going to generate the formula for sin, though I’d love to be proven wrong.. maybe only a few more lisp operators need be added…) The program doesn’t self-terminate unless all the life goes extinct (which happens sometimes with code bloat.. if the genome sizes get too big, usually through stutter, life forms quickly cease to be able to afford but one child.. this is why I’m considering creating a hard limit on the length of stutter). The program’s fitness is determined by how distant its result is algebraically from the correct answer (I’m just using the absolute value of the difference). You can specify how close it must be to be within the baseline. If it’s not that close, then its fitness is negative and it will die without reproducing. The fitness provides the calories it needs to create the following generation.
After the next generation is created, the organisms are sorted by fitness and the least fit are killed until there’s only WorldSize left, WorldSize being what we decide the population to be at the top. Rinse, lather, repeat. You can start this program and step away if you want to have your computer do something useless for a while. Or, better yet, you can probably program it to do something useful, though this isn’t using the best practices at this point and should be looked at as the simplest you can make gp and it still work. Perhaps this represents life in a primitive pre-dna state and gp will only be useful once its as sophisticated and awesome as dna. We don’t have to start at cosmic ooze and wait. We know how the story ends, so we can fast forward…
Overall, the program is mostly self-explanatory, but feel free to leave any comments or ask any questions about it. It’s a good learning experience and it’s certainly giving me some ideas…
</p> <p>Your browser does not support iframes, so just download the code directly <a href=”/code/yeasties.rb” mce_href=”/code/yeasties.rb”><span><span>here</span></span></a>.</p> <p>