Metalearning (neuroscience)

Metalearning is a neuroscientific term proposed by Kenji Doya, as a theory for how neurotransmitters facilitate distributed learning mechanisms in the Basal Ganglia. The theory primarily involves the role of neurotransmitters in dynamically adjusting the way computational learning algorithms interact to produce the kinds of robust learning behaviour currently unique to biological life forms. 'Metalearning' has previously been applied to the fields of Social Psychology and Computer Science but in this context exists as an entirely new concept.

The theory of Metalearning builds off earlier work by Doya into the learning algorithms of Supervised learning, Reinforcement learning and Unsupervised learning in the Cerebellum, Basal Ganglia and Cerebral Cortex respectively. The theory emerged from efforts to unify the dynamic selection process for these three learning algorithms to a regulatory mechanism reducible to individual neurotransmitters.

Dopamine
Dopamine is proposed to act as a "global learning" signal, critical to prediction of rewards and action reinforcement. In this way, dopamine is involved in a learning algorithm in which Actor, Environment and Critic are bound in a dynamic interplay that ultimately seeks to maximise the sum of future rewards by producing an optimal action selection policy. In this context, Critic and Actor are characterised as independent network edges that also form a single Complex Agent. This Agent collectively influences the information state of the Environment, which is fed back to the Agent for future computations. Through a separate pathway, Environment is also fed back to Critic in the form of the reward gained through the given action, meaning an equilibrium can be reached between the predicted reward of given policy for a given state, and the evolving prospect of future rewards.

Serotonin
Serotonin is proposed to control the balance between short and long term reward prediction, essentially by variably "discounting" expected future reward sums that may require too much expenditure to achieve. In this way, serotonin may facilitate the expectation of reward at a quasi-emotional level, and thus either encourage or discourage persistence in reward-seeking behaviour depending on the demand of the task, and the duration of persistence required. As global reward prediction would theoretically result from Serotonin modulated computations reaching a steady state with the computations similarly modulated by Dopamine; high serotonergic signalling may override the computations of Dopamine and produce a divergent paradigm of reward not mathematically viable through the dopamine modulated computations alone.

Norepinephrine
Norepinephrine is proposed to facilitate "wide exploration" by stochastic action selection. The choice between focusing on known, effective strategies or selecting new, experimental ones is known in probability theory as the Exploration-Exploitation Problem. An interplay between situational urgency, and the effectiveness of known strategies thus influences the dilemma between reliable selection for the largest predicted reward, and exploratory selection outside known parameters. Since neuronal firing cascades (such as those required to perfectly swing a golf club) are by definition unstable and prone to variation; Norepinephrine thus selects for the most reliable known execution pattern at higher levels, and allows for more random and unreliable selection at low levels with the purpose of potentially discovering more efficient strategies in the process.

Acetylcholine
Acetylcholine is proposed to facilitate the balance between memory storage and memory renewal, finding an optimal balance between stability and effectiveness of learning algorithms for the specific environmental task. Acetylcholine thus modulates plasticity in the Hippocampus, Cerebral Cortex and Striatum to facilitate ideal learning conditions in the brain. High levels of Acetylcholine would thus allow for very rapid learning and remodelling of synaptic connections, with the consequence that existing learning may become undone. Likewise, the learning of states that takes place over an extended temporal resolution may be overridden before it reaches a functional level, and thus learning may occur too quickly to actually be performed efficiently. At lower levels of Norepinephrine, plastic changes are proposed to occur much more slowly, potentially being protective against unhelpful learning conditions or allowing for information changes to embody a much broader temporal resolution.

Metalearning
Central to the idea of Metalearning is that global learning can be modelled as function of efficient selection of these four neuromodulators. While no mechanistic model is put forward for where Metalearning ultimately exists in the hierarchy of agency, the model has thus far demonstrated the dynamics necessary to infer the existence of such an agent in biological learning as a whole. While computational models and information systems are still far away from approaching the complexity of human learning; Metalearning provides a promising path forwards for the future evolution of such systems as they increasingly approach the complexity of the biological world.

Potential Applications
The investigation of Metalearning as a neuroscientific concept has potential benefits to both the understanding and treatment of Psychiatric Disease, as well as bridging the gaps between Neural Networks, Computer Science and Machine Learning.