In my earlier post I had mentioned about Ant Colony Optimization. Another interesting bio-inspired heuristic is genetic algorithm – GA for short.
Basis of inspiration
The theory of evolution is fairly accepted – I would suggest the interesting book if not so. Along with this the concepts of DNA, combination and mutation too are familiar. From this however the concept of Genetic algorithm was born and has been used in many ways to solve complex problems. John Henry Holland has done much work on this.
The basic steps of GA is as follows
Set Step to 0
Create an initial population
Evaluate the fitness of the population
While there is scope to improve fitness or the stopping criteria has not been reached do
Recombine population instances resulting in offspring
Evaluate the offspring
Select new population from original population and offspring
Increment the Step
Certain terminologies will be useful here. We will also use a simple example to faciliate the understanding process. The problem is simple . Assume that we need to maximise a function
F(x,y)= 2×2-3y3+4xy, where x is in the interval (3,7) and y is in the interval (1,16) and x and Y are not fractions. A simple method from a programming perspective will be as shown below
For x = 3 to 7 step 1
For y = 1 to 16 step 1
Generate values for F(x,y) and store them
Extract highest value of F(x,y).
However let us use the example to explore the GA. The first step is creating the initial population. This involves what is termed as encoding the sequence. In the human DNA the DNA chain is made up of four nucleic acid bases (no pun intended) – ACGT standing for Adenine, Cytosine, Guanine and Thymine.
In computation terms we will use a DNA strand – our population will consist of binary nucleic acid bases i.e. 0 and 1.
These two 4-bit strands can be visualized as follows
In our example one simple way is to observe that a 3-bit string will suffice to represent x and a 4-bit string for y. The value of x there will be (011,100,101,111) and y will have all the four bits utilized except for 0000. For simplicity we can use 4 bits for both x and y. So let us create an initial population.
The best solution is now X=5 and y=6 with a fitness of -478.
We notice that population instance 3 and 4 are offering better results than 1 and 2. So we can now combine, or to use a technical term do a crossover, of two DNA strands as follows.
When the new offspring so generated are evaluated for Fitness we see,
The best solution is now X=4 and y=6 with a fitness of -520.
These steps of generation, evaluation, recombination is done repeatedly until we reach a stopping criterion. The stopping criteria may be
- No change in three successive fitness values
- A fixed number of generation process
- An acceptable level of fitness.
Challenges in using GA
The key challenges lie in encoding the problem in an appropriate genetic pattern and coming up with crossover rules. Crossover may happen at any point. We have chosen at 4-bits but we could as well have tried out the following.
This evaluates to,
The best solution is now X=5 and y=4 with a fitness of -62.
Another challenge is the introduction of mutation. When a crossover is done one may randomly change a bit value from 0 to 1 or vice versa. The method to introduce the effect of mutation is difficult. Yet another challenge is to identify dominant genes and crystallize them. For example it may be observed that better values emerge when the high order bits of x are 1. GA has been used in many engineering applications and problems that have a complex fitness landscape.