User:Dpleibovitz/sandbox/Exploration–exploitation trade-off

The exploration–exploitation trade-off (with trade-off often substituted with balance, dilemma, framework, interplay, paradox, problem, strategy, or tension) is a conceptual classification of two competitive or collaborative activities or processes involved with learning. Exploration is the gathering of raw information leading to the low-level learning of factual knowledge. Exploitation is often thought of as merely using the raw information. However, one use is to synthesize even more useful knowledge leading to the high-level learning of 'theoretical' knowledge. Indeed, this already highlights a subsequent trade-off between learning data and learning theory. Moreover, while 'theoretical' can be a simple associative hypothesis such as this kind of food is often found near lakes, such theories can simplify subsequent exploration. However, this may imply that exploration cannot be made independent from exploitation - they are complementary aspects. Traditionally, exploratory and exploitative activities compete for resources (e.g., time and effort), so a trade-off between the two must be made in order to optimize learning.

This article reviews the associated concepts in terms of their heuristic and informal use, to their distinct and formal mathematical or computational use. There is no single accepted definition or theory behind these terms. Nevertheless, it is possible to view all aspects of the world through the instrumental lens of learning via exploring and exploiting, along with their competitive or collaborative aspects.

Old
For example, to find the best food in the world might require a lifetime of exploration, but if one does not exploit intermediate results by eating, then one will starve. Similarly, if one only eats a single source of food, it can disappear (or not have a balance of nutrients) and result in starvation. A balance must be made between eating (exploitation) and finding (exploring) food. Indeed, this balance can be dynamic - the hungrier one gets, the less the exploration. A professional chef might spend an inordinate amount of time exploring the fresh and exotic.

Imperfect categories
These terms are not perfect oppositions. For example, when a web search engine indexes the web, this upstream activity can be thought of as exploring. Individuals exploit the results for immediate downstream search results. However, during the exploratory indexing phase, information can also be synthesized. A PageRank can be calculated (learned). This synthesis of information can be considered as an upstream exploitation that occurs during exploration, and is further exploited during downstream search.

Traditionally, learning is considered to occur during exploration, e.g., where the best food is. However, the exploitative synthesis of information is the highest form of learning. Perhaps a desired food is associated with lakes, and one can now look for lakes first - an easier task than looking for a particular food. Thus exploitation can direct exploration. As a further example, experimental physics can be thought of as empirically exploring the world. However, theoretical physics produces the most analytically profound understanding to the nature universe, and this exploitative learning can be done entirely within the armchair. Nevertheless, when looked across the many theoretical physicists who attempted synthesis, each can be thought of as exploring a different set of hypothesis or intuitions. Thus exploitation consists of exploration.

Exploration
Exploration is the initial information seeking activity or process involved in learning, i.e., in finding where things are, or what things are. However, it is often directed by a previous exploitation when these two coupled processes continuously interplay, so it is a chicken or the egg example of which actually came first.

Exploratory activities include searching through a space of possibilities. However, many search algorithms simultaneously find (or learn), and these may not be considered as separating exploration from exploitation. Foraging is an example of information seeking behavior that can learn where good resources exists, to be accessed or exploited later when required. However, changes in season could require the same space to be foraged all over again. This is one example of the trade-off - when does exploitation become sub-optimal requiring re-exploration.

In many cases, learning is continuous and incidental to main activities. For example, while migrating, resources found along the way could be remembered (without actively being searched for), to be exploited on a subsequent migration.

Play fighting can be thought of as exploring, rehearsing and learning the skills required and exploited in actual fight.

When a web search engine indexes and possibly ranks the the web, that can be considered as exploratory with the finding of specific pages as the exploitative payoff. ### MAKE An Explore/Exploit TABLE OF EXAMPLES ###

Exploitation

 * Analysis
 * Synthesis

Informal learning

 * Scientific progress
 * Evolution

Formal learning
In machine learning
 * organizational theory

Trade-off

 * Funding between basic/theoretical and applied/experimental research

ToDo
Add redirects
 * exploration–exploitation tradeoff
 * exploration-exploitation trade-off
 * exploration-exploitation tradeoff
 * exploration–exploitation
 * exploration-exploitation
 * exploratory learning
 * exploitative learning