MAP-Elites - Notes on AI

# MAP-Elites The goal of a search algorithm typically is to return the single highest-performing solution in a search space. MAP-Elites (Multi-dimensional Archive of Phenotypic Elites) instead aims to provide a set of diverse, high-performing, yet qualitatively different solutions throughout the search space, across user defined axes of "features". Because MAP-Elites explores more of the search space, it tends to find a better overall solution than state-of-the-art search algorithms. MAP-Elites is an "illumination algorithm", a superset of optimization algorithm as they illuminate the fitness potential of each region of the feature space and not just return optimal solutions or solutions at the Pareto front. This is opposed to other search paradigms such as [[Stochastic Gradient Descent|gradient descent]], [[Gaussian Processes|bayesian optimization]] and multi-objective optimization algorithms. > MAP-Elites and other illumination algorithms, in contrast, map the entire feature space to inform users about what is possible and the various tradeoffs between features and performance. Such phenotype-ﬁtness maps, as they are known in biology, are interesting in their own right, and can also be put to practical use. For example, a recent paper showed that the map can provide a repertoire of different, high-performing solution that can initiate search for new behaviors in case an original behavior no longer works. For example, a recent paper showed that the map can provide a repertoire of different, high-performing solution that can initiate search for new behaviors in case an original behavior no longer works (e.g. if a robot becomes damaged or ﬁnds itself in a new environment) ## Good solutions requires overcoming local minima A notorious challenge in black box optimization is the presence of local optima (also called local minima). A problem with most search algorithms of this class is that they try to follow a path that will lead to the best global solution by relying on the heuristic that random changes to good solutions lead to better solutions. MAP-Elites is motivated from a view point that this traditional "greedy" approach does not work for highly deceptive problems (i.e. search spaces with pathologies in their landscape, see [[Challenges of optimizing deep models]]) because in such problems one has to cross low-performing valleys to find the global optima, or even just to find better optima. Many approaches recognize this issue and introduce diversity-promoting techniques. Such diversity-promoting techniques often improve the quality of the solutions produced and the number of different types of solutions explored, but search algorithms still tend to converge to one or a few good solutions early and cease to make further progress. ## Optimize for novelty (diversity) over performance Another motivation for pursuing diverse set of high-performing solutions is the paradoxical view that optimizing directly for performance, especially in deceptive problems, leads to suboptimal result. The alternative that often seems to work better is to optimize for diversity of the set of good solutions. > An alternate idea proposed in recent years is to abandon the goal of improving performance altogether, and instead select only for diversity in the feature space (also called the behavior space): This algorithm, called Novelty Search, can perform better than performance-driven search on deceptive problems. ## Fully exploring the search space is critical Exploring and studying the search space itself is often important. > While the computer science literature offers plenty of options for dimen- sionality reduction and visualization of high-dimensional data, such algorithms are “passive” in that they take a ﬁxed data set and search for the best low-dimensional visualization of it. They do not tackle the issue of generating this data set. In other words, they do not explore a high-dimensional space in such a way as to reveal interesting properties about it to a user via a low- dimensional visualization. Sampling is not enough. > To identify all the performance peaks in a large search space, we must actively search for them. It is not enough to sample millions of solutions and plot them, for the same reason as random sampling is often not a good optimization algorithm: ﬁnding a ﬁtness peak by chance is very unlikely for large search spaces. ## The Algorithm ![[map-elites.png]] Prerequisites to define a search procedure: - Choose a performance measure f(x) that evaluates a solution x. - The choose the N dimensions of variation of interest that deﬁne a feature space of interest to the user. For example: - > For robot morphologies, one dimension of interest could be how tall the robot is, another could be its weight, a third could be its energy consumption per meter moved, etc. - > For searching for chess programs, where the performance measure is the win percentage, and the dimensions of variation could be the aggressiveness of play, the speed with which moves are selected, etc. - Discretization of the feature space is usually dictated by available resources. - > Each dimension of variation is discretized based on user preference or available computational resources. This granularity could be manually speciﬁed or automatically tuned to the available re- sources, including starting with a coarse discretization and then increasing the granularity as time and computation allow. - A feature (or behavior) function b(x) must also exist or be deﬁned that, for each x, determines x’s value in each of the N feature dimensions. This can be a very simple or a very complex mapping. The algorithm: - Random generate G solutions (genomes) and determine the performance and features of each - Place them in random order in their cell, if multiple map to same cell retain only the highest performing one. - Repeat until termination criteria: - A cell in the map is randomly chosen and the genome in that cell produces an offspring via mutation and/or crossover. - The features and performance of that offspring are determined, and the offspring is placed in the cell if the cell is empty or if the offspring is higher-performing than the current occupant of the cell, in which case that occupant is discarded. Termination criteria is quite flexible: > The termination criterion can be many things, such as if a set amount of time expires, a ﬁxed amount of computational re- sources are consumed, or some property of the archive is pro- duced. Examples of the latter could include a certain percentage of the map cells being ﬁlled, average ﬁtness in the map reaching a speciﬁc level, or n solutions to a problem being discovered. ## Extensions and further ideas - **Hierarchical Feature Space**: Start with a coarse resolution feature space, then subdivide into smaller cells during search after predetermined numbers of evaluations. - **Parallelization**: MAP-Elite can be highly parallelized at each step. Perform mutation, evaluations in parallel across different compute nodes. - Storing more than one genome per feature cell to promote diversity. - Biasing the choice of which cells produce offspring, such as biasing towards cells who have empty adjacent cells, cells near low-performing areas, cells near high-performing areas, etc. - > In preliminary experiments, such biases did not perform better than the default MAP-Elites. - **Crossover**: - > Crossover may be especially effective when restricted to occurring between organisms nearby in the feature space. - > One could make crossover only occur within a certain radius of an individual or as a probabilistic function of the distance between organisms (leading to overlapping crossover zones), or only within certain regions (more akin to an island model). Note that even with geographically restricted crossover, offspring could still end up in different areas of the feature space than their par- ents (either due to the effects of mutation or crossover). ## Other tips and notes - If evaluations are fast, it allows high resolution maps even for extremely high-dimensional search space. - > The neural network search space is interesting be- cause evaluations are fast, allowing us to draw high-resolution feature-space maps for a high-dimensional search space. - Mutating neighbors nearby in feature space is quite effective - > We next investigated the assumption that elites are found by mutating genomes nearby in the feature space, and found that this assumption is largely true. - Hyper-params used in "Retina experiments" in the paper: - starting size of the map: $16 \times 16$ - final size of the map: $512 \times 512$ - batch size: 2,000 - number of iterations: 10,000 - initial batch: 20,000 - resolution change program (4 changes): - iteration 0: $64 \times 64$ - iteration 1250: $128 \times 128$ - iteration 2500: $256 \times 256$ - iteration 5000: $512 \times 512$ - feature 1: connection cost - feature 2: network modularity - performance: percent answers correct on retina problem ### Features/behaviors vs performance metrics It can be confusing how to think about features, which are supposed to reflect some "behavior" of the solution vs the performance metrics. Here's some of my realizations: - Do features need to be "orthogonal"? - ideally they should capture _meaningfully different_ aspects of the solution space. The goal is to characterize _how_ a solution behaves, not _how well_ it performs. - in practice, you want features that are as independent as possible, but perfect orthogonality isn't required. - Can performance objective component be used as a feature dimension? - Generally no, but could be ok if its one of the features among other more neural features. - The whole point of MAP-Elites is to separate _what_ a solution is (the behavior descriptors / features) from _how good_ it is (the fitness/performance). - If we use performance measures as features, the map would naturally have its best solutions clustered at one end of that axis, defeating the purpose of uniform exploration. - You lose the diversity benefit. MAP-Elites is powerful precisely because it forces you to find the _best solution for each niche_, even niches you wouldn't normally explore. If a dimension is just "be worse," there's no interesting diversity being preserved. - What are some good features? - They should describe _structural or behavioral properties_ of the solution. These are things where there's no inherently "better" direction, such as: - Size, weight, complexity, cost - Behavioral characteristics (e.g., how much a robot leans, how symmetric a design is) - Design choices (e.g., number of components, material ratios) - How come [[AlphaEvolve]] use performance metric as one of the features, alongside "code diversity" and "code complexity"? - In classical [[Quality Diversity]] (robotics, etc.), the goal is _the archive itself_ i.e. you want a diverse repertoire of good solutions. Low-performing cells are wasteful. - In AlphaEvolve, the goal is to find _one single best solution_. The MAP-Elites archive is just a **diversity maintenance mechanism** to prevent the LLM from converging too early - Low-performing cells aren't the "answer", they serve as **stepping stones** and **diverse inspirations** for the LLM to draw from when generating new candidates. "Bad" cells do have values in this way. - Another idea (my own) is something i like to call cell-local fitness which allows us to use objective components as features. ## Connection to Natural Evolution MAP-Elite is a step towards [[Open-Endedness]]. > illumination algorithms like MAP-Elites may help evolutionary algorithms move closer to the open-ended evolution seen in the natural world, which produced a tremendous diversity of organisms (within one run). However, it is limited by the pre-defined static feature space. > One drawback to MAP-Elites, however, is that it does not allow the addition of new types of cells over time that did not exist in the original feature space. It thus, by deﬁnition, cannot exhibit open-ended evolution.