6 Search

Search is the general process of finding an object that satisfies certain properties among a (potentially huge) set of distinct objects. When learning to code, search is probably one of the first problems you will tackle: searching for an item in an array by looking at each element at a time is the quintessential example of an algorithm in many introductory programming classes. Throughout the typical Computer Science curriculum, students will learn to search efficiently by organizing items into various data structures, the ultimate embodiment of which is a database system.

However, searching is not limited to arrays or collections of objects explicitly defined. We can frame many, if not all, CS problems as search problems. Sorting, for example, can be framed as searching for a permutation of the elements with zero inversions. In this case, the space of objects we are searching is the set of all permutations of a given array. We call this an “implicit” search space because it is not explicitly given in the problem definition.

This example is, of course, moot: why would we ever attempt to sort this way when we have extremely fast and simple sorting algorithms like Quicksort and Mergesort? But here’s the kicker: sorting algorithms like these work fast because they split the space of all possible partitions into subspaces so that you can easily discard a large chunk. This is what we call a “structured” search. It’s like binary search, but in the space of permutations!5

The more profound lesson here is that if you understand the structure of the search space, you can search much more efficiently. All our fast algorithms for specific problems exploit some search space structure. The question then becomes, can we also do this for the hardest problems, where it seems we can’t find any such structure that guarantees a fast solution? Fortunately, the answer is yes… well, to some degree.

There are two major approaches to searching in the realm of hard problems: heuristics and metaheuristics.

Heuristic search

A heuristic is a rule-of-thumb strategy that exploits some known properties of a search problem to improve search performance. One of the most well-known and used heuristic search algorithms is A*, which powers the GPS routing app on all phones.

A* is a graph search algorithm that finds the shortest path between an origin and one or many equivalent destinations. At its core, it is very similar to Dijkstra’s shortest path algorithm, but one key idea makes it incredibly faster: it assumes some information about the location of destination nodes.

For example, in GPS routing, you don’t exactly know, without a full search, what the shortest path to get someplace from your home is. But if that place is northeast of your position, assuming some reasonable city layout, it makes sense that roads going north or east are more likely to get you there faster than roads going to the west or the south.

Of course, this isn’t always the case since there could be detours, highways, tunnels, etc. If you blindly go toward the destination, you could end up stuck in a dead end. However, this useful knowledge can be exploited intelligently to make search much more efficient if you’re clever about it, and this is what A* does.

Instead of blindly following the most direct route, A* keeps track of all routes, even those that extend in the opposite direction. However, contrary to traditional graph search algorithms, it doesn’t explore all routes with the same effort. Instead, when choosing the next node to explore, A* will consider with a higher probability those nodes that the heuristic favors—e.g., those that lie in a direction more closely aligned with the destination—but it takes care not to get stuck, and it’s able to backtrack when necessary.

In the end, A* performs no worse than traditional graph search algorithms like Dijkstra’s and often does it much better, provided your heuristic is well-designed.

Metaheuristics

Where heuristics are problem-specific strategies to leverage domain insights and accelerate the search in a concrete problem, metaheuristics are general-purpose search strategies that leverage knowledge about the search paradigm itself.

Metaheuristics are domain-independent search algorithms that can be applied in any case where nothing else works. Solving TSP is one of the most common examples, but their uses extend to anything from routing to logistics and planning.

A primer example of a metaheuristic approach to problem-solving is the field of evolutionary algorithms. These are computational strategies inspired by certain aspects of the biological process of evolution that attempt to extract key mechanisms that make evolution work and apply them in the context of, say, finding the optimal way to design a GPU.

The basic idea of all evolutionary algorithms is to maintain a “population” of solutions (e.g., different circuit designs for a GPU) that undergo cycles of “breeding” and “selection”, much like biological evolution. An example of a breeding scheme is taking two of these potential GPU designs and build a new design that incorporates some elements from either of the “parents”. This creates a new solution that is, in some sense, an evolved version of previous solutions, which may make it better or worse.

Then, we apply some selection criteria, most often just evaluating the solutions on whatever problem you’re solving—e.g., which GPU design works faster at certain benchmarks—and keep the best 50%. This cycle of breeding and selection is repeated, producing a sequence of increasingly specialized GPU designs that, almost like evolution, seems to magically discover quasi-optimal design elements just by sheer luck and relentless repetition.

Much more can be said about evolutionary algorithms and metaheuristics in general. Still, the overall idea is as simple as that: find effective search strategies that seem to work in almost all domains—finding inspiration in nature, engineering, social networks, etc.—and build computational strategies that leverage these insights.