A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir
There are three main elements to this approach:
- The Genotype (the search space)
- The Fitness function (the search goal)
- The strategy (the search strategy)
- Evolve (evolution strategy)
- Permutate (for small search spaces, with a 100% guarantee)
- HillClimb (when search space is convex with little local optima or when crossover is impossible/inefficient)
Terminology:
- Population: a population has
population_size
number of individuals (called chromosomes). - Chromosome: a chromosome has
genes_size
number of genes - Allele: alleles are the possible values of the genes
- Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
- Genes: storage trait of the genes for a chromosome, mostly
Vec<Allele>
, but alternatives possible - Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
- Fitness: knows how to determine the fitness of a chromosome
All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.
See docs.rs
use genetic_algorithm::strategy::evolve::prelude::*;
// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
.with_genes_size(100) // 100 genes per chromosome
.build()
.unwrap();
println!("{}", genotype);
// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
type Genotype = BinaryGenotype; // Genes = Vec<bool>
fn calculate_for_chromosome(
&mut self,
chromosome: &FitnessChromosome<Self>,
_genotype: &FitnessGenotype<Self>
) -> Option<FitnessValue> {
Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
}
}
// the search strategy
let evolve = Evolve::builder()
.with_genotype(genotype)
.with_select(SelectElite::new(0.5, 0.02)) // sort the chromosomes by fitness to determine crossover order. Strive to replace 50% of the population with offspring. Allow 2% through the non-generational best chromosomes gate before selection and replacement
.with_crossover(CrossoverUniform::new(0.7, 0.8)) // crossover all individual genes between 2 chromosomes for offspring with 70% parent selection (30% do not produce offspring) and 80% chance of crossover (20% of parents just clone)
.with_mutate(MutateSingleGene::new(0.2)) // mutate offspring for a single gene with a 20% probability per chromosome
.with_fitness(CountTrue) // count the number of true values in the chromosomes
.with_fitness_ordering(FitnessOrdering::Maximize) // optional, default is Maximize, aim towards the most true values
.with_target_population_size(100) // evolve with 100 chromosomes
.with_target_fitness_score(100) // goal is 100 times true in the best chromosome
.with_reporter(EvolveReporterSimple::new(100)) // optional builder step, report every 100 generations
.call();
.unwrap()
println!("{}", evolve);
// it's all about the best genes after all
let (best_genes, best_fitness_score) = evolve.best_genes_and_fitness_score().unwrap();
assert_eq!(best_genes, vec![true; 100]);
assert_eq!(best_fitness_score, 100);
Run with cargo run --example [EXAMPLE_BASENAME] --release
- N-Queens puzzle https://en.wikipedia.org/wiki/Eight_queens_puzzle.
- See examples/evolve_nqueens
- See examples/hill_climb_nqueens
UniqueGenotype<u8>
with a 64x64 chess board setup- custom
NQueensFitness
fitness
- Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
- See examples/evolve_knapsack
- See examples/permutate_knapsack
BinaryGenotype<Item(weight, value)>
each gene encodes presence in the knapsack- custom
KnapsackFitness(&items, weight_limit)
fitness
- Infinite Monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem
- See examples/evolve_monkeys
ListGenotype<char>
100 monkeys randomly typing characters in a loop- custom fitness using hamming distance
- Permutation strategy instead of Evolve strategy for small search spaces, with a 100% guarantee
- HillClimb strategy instead of Evolve strategy, when crossover is impossible or inefficient
- Explore vector genes (BinaryGenotype) versus other storage (BitGenotype)
- Explore internal and external multithreading options
- Use superset StrategyBuilder for easier switching in implementation
- Use fitness LRU cache
- See examples/evolve_binary_cache_fitness
- Note: doesn't help performance much in this case... or any case, better fix your population diversity
- Custom Reporting implementation
For the Evolve strategy:
- Reporting: start with EvolveReporterSimple for basic understanding of:
- fitness v. framework overhead
- staleness and population characteristics (cardinality etc.)
- Select: no considerations. All selects are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
- Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population to produce offspring (depending on selection-rate). It also calculates new genes hashes if enabled on the Genotype, which has a relatively high overhead on the main Evolve loop.
- Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
- Fitness: can be anything. This fully depends on the user domain. Parallelize
it using
with_par_fitness()
in the Builder. But beware that parallelization has it's own overhead and is not always faster.
GPU acceleration
There are two genotypes where Genes (N) and Population (M) are a stored in single contiguous memory range of Alleles (T) with length N*M on the heap. A pointer to this data can be taken to calculate the whole population at once. These are:
- DynamicMatrixGenotype
- StaticMatrixGenotype
Useful in the following strategies where a whole population is calculated:
- Evolve
- HillClimb-SteepestAscent
Possibly a GPU compatible memory layout still needs to be added. The current implementation just provides all the basic building blocks to implement this. Please open a github issue for further support.
Run tests with cargo test
Use .with_rng_seed_from_u64(0)
builder step to create deterministic tests results.
Implemented using criterion. Run benchmarks with cargo bench
Implemented using criterion and pprof.
Uncomment in Cargo.toml
[profile.release]
debug = 1
Run with cargo run --example profile_evolve_binary --release -- --bench --profile-time 5
Find the flamegraph in: ./target/criterion/profile_evolve_binary/profile/flamegraph.svg
- Target cardinality range for Mutate Dynamic to avoid constant switching (noisy in reporting events)
- Add scaling helper function
- Add simulated annealing strategy
- Add Roulette selection with and without duplicates (with fitness ordering)
- Add OrderOne crossover for UniqueGenotype?
- Order Crossover (OX): Simple and works well for many permutation problems.
- Partially Mapped Crossover (PMX): Preserves more of the parent's structure but is slightly more complex.
- Cycle Crossover (CX): Ensures all genes come from one parent, useful for strict preservation of order.
- Edge Crossover (EX): Preserves adjacency relationships, suitable for Traveling Salesman Problem or similar.
- Add WholeArithmetic crossover for RangeGenotype?
- Add CountTrueWithWork instead of CountTrueWithSleep for better benchmarks?
- Explore more non-Vec genes: PackedSimd?
- Maybe use TinyVec for Population? (it us usually less than 1000 anyway), maybe useful paired with MatrixGenotype, where the chromosomes are lightweight (and Copyable)
- StrategyBuilder, with_par_fitness_threshold, with_permutate_threshold?
- Add target fitness score to Permutate? Seems illogical, but would be symmetrical. Don't know yet
- Add negative selection-rate to encode in-place crossover? But do keep the old extend with best-parents with the pre v0.20 selection-rate behaviour which was crucial for evolve_nqueens
- hill_climb SteepestAscent actually has a population size requirement of neighbouring_population_size + 1, because of the working chromosome. This could overflow StaticMatrixGenotype<T, N, M>, use StaticMatrixGenotype<T, N, { M + 1 }> as workaround