User:Xperroni/Weightless neural network

Weightless Neural Networks (WNN's) are a form of content-addressable memory loosely inspired by the excitatory / inhibitory decoding performed by the dendritic trees of biological neurons. In contrast to traditional artificial neural network models, weightless neurons do not store sets of adaptive weights. Instead, they record sample input / output pairs: when a test input is presented to a neuron, it searches its list of stored inputs for a matching entry, and returns the output associated to the matched (stored) input. Different WNN architectures differ on possible output types, how input pairs are matched, whether a test input can have no match and what the response is in such a case, but they generally agree in restricting inputs to be bit strings of a fixed length.

Although there is a large degree of intersection between weightless neural networks and Boolean Neural Network (BNN's), as a whole the two approaches are distinct. BNN's are those neural networks that handle only boolean values at their inputs and outputs, but this definition does not exclude architectures where neurons store sets of adaptive weights: for example, Hopfield networks are BNN's but not WNN's. Likewise, some WNN's can output non-boolean values.

Weightless neural networks have been successfully applied to a variety of supervised and unsupervised learning problems. Examples include face recognition, data clustering , stock return prediction and multi-label text categorization. Since training is performed simply by writing sample cases to neurons, usually a weightless neural network can be completely trained in a single unordered pass through the data. Input matching can be implemented efficiently by defining mismatch as the Hamming weight of the XOR product between compared bit strings. WNN's essentially implement nearest neighbor search over the space of inputs, therefore neurons can model complex functions without need for multi-layer arrangements.

Weightless neural networks were first proposed by Igor Aleksander in the 1960's. Aleksander was interested in the application of N-tuple sampling machines to learning problems, and developed the RAM node as a universal logic circuit for hardware-based machine learning. Continuous improvements to integrated circuit technology led to these being dropped in favor of standard RAM memories by the 1980's, and by the early 1990's WNN architectures were typically realized in software running on desktop computers and other general-purpose hardware. Currently there is increased interest in parallel implementations, particularly on top of GPU architectures.

Architectures
Weightless neural network architectures differ across a number of features, including:


 * The training process;
 * How test inputs are matched to recorded entries;
 * What is the response when no match can be found for a test input;
 * The extent to which the "undefined" value u (which is neither 0 nor 1) can be used as a placeholder for bit values;
 * Which values can be returned as output;
 * The overall network layout.

Some of the WNN architectures proposed over the years are described below. A more complete review is available in the literature.

WISARD
The Wilkie, Aleksander and Stonham's Recognition Device (WISARD) was a general purpose pattern recognition machine developed in the 1980's. It was based on the principle of RAM nodes invented by Aleksander, but actually implemented using conventional RAM banks. A RAM node is a look-up table composed of 2$N$ entries and a binary address bus of length N. Each entry holds a single binary value (that is, either 0 or 1). RAM nodes are trained by first setting all entries to 0, and then setting select entries to 1. When a test input is presented to a RAM node, it will return either 1 if this value was written to the corresponding entry, or 0 otherwise.

In the WISARD architecture, RAM nodes are grouped in units called discriminators, which sample a common input area through distinctive connection patterns – that is, no two discriminators read the exact same input region. During training, each discriminator assigns 1 to the patterns (that is, memory addresses) representing the class it is supposed to recognize, and 0 to the others. During test, all discriminators sample the input area and return the sum of responses from its RAM nodes: the highest output indicates the discriminator (and therefore the class) that best matched the input. Connection patterns are chosen randomly at the time of network setup, and preserved for reuse from that point on as a network parameter.

Probabilistic Logic Node (PLN)
When a RAM node returns 1, this is an unambiguous indication that the input was recognized as a member of the trained class. However, when the response is 0, there is no way to tell apart a negative example (that is, a pattern that was marked as not belonging to the class) from a previously unseen pattern. The Probabilistic Logic Node (PLN) solves this problem by allowing each memory location to store one of three possible values: 0, 1 or u (undefined). PLN's are trained by first setting all entries to u, and then setting select entries to either 1 or 0, depending on whether they belong or not to the trained class. When a test input addresses an entry with an undefined value, either 0 or 1 can be returned – the output of a random bit generator is used.

In a PLN network, nodes are grouped in three-layered tree structures called pyramids, where each node has a limited number of inputs (fan-in is low) and is connected to at most a single output node (fan-out is 1). A network is composed of several pyramids whose input terminals sample a common input area: their number and internal connection patterns constitute the network's parameters.

Several algorithms have been devised for training PLN networks. One of the simplest is as follows :


 * 1) Set all memory entries in all nodes to u;
 * 2) Load a training pattern to the common input area;
 * 3) As nodes in the input layer fire in response to the presented input, activity propagates forward through the network until it reaches the output layer, producing the network's output;
 * 4) Compare the network output to the desired responses for the given training input;
 * 5) For each node in the output layer:
 * 6) If the desired value matches the node's output, any u entries addressed will be written with the last value returned;
 * 7) Otherwise, try again; if a match could not be achieved after &beta; tries, any defined entries addressed have their values reverted to u;
 * 8) Repeat steps 2-5 for all training cases, until all patterns produce correct responses without any (re)writing of memory entries.