Extension neural network

Extension neural network is a pattern recognition method found by M. H. Wang and C. P. Hung in 2003 to classify instances of data sets. Extension neural network is composed of artificial neural network and extension theory concepts. It uses the fast and adaptive learning capability of neural network and correlation estimation property of extension theory by calculating extension distance.

ENN was used in:
 * Failure detection in machinery.
 * Tissue classification through MRI.
 * Fault recognition in automotive engine.
 * State of charge estimation in lead-acid battery.
 * Classification with incomplete survey data.

Extension Theory
Extension theory was first proposed by Cai in 1983 to solve contradictory problems. While classical mathematics is familiar with quantity and forms of objects, extension theory transforms these objects to matter-element models.

where in matter $$R$$, $$N$$ is the name or type, $$C$$ is its characteristics and $$V$$ is the corresponding value for the characteristic. There is a corresponding example in equation 2.

where $$Height$$ and $$Weight$$ characteristics form extension sets. These extension sets are defined by the $$V$$ values which are range values for corresponding characteristics. Extension theory concerns the extension correlation function between matter-element models like shown in equation 2, and extension sets. Extension correlation function is used to define extension space which is composed of pairs of elements and their extension correlation functions. The extension space formula is shown in equation 3.

where, $$A$$ is the extension space, $$U$$ is the object space, $$K$$ is the extension correlation function, $$x$$ is an element from the object space and $$y$$ is the corresponding extension correlation function output of element $$x$$. $$K(x)$$ maps $$x$$ to a membership interval $$\left [ -\infin,\infin \right ] $$. Negative region represents an element not belonging membership degree to a class and positive region vice versa. If $$x$$ is mapped to $$\left [ 0,1 \right ] $$, extension theory acts like fuzzy set theory. The correlation function can be shown with the equation 4.

where, $$X_{in}$$ and $$X_{out}$$ are called concerned and neighborhood domain and their intervals are (a,b) and (c,d) respectively. The extended correlation function used for estimation of membership degree between $$x$$ and $$X_{in}$$, $$X_{out}$$ is shown in equation 5.



Extension Neural Network
Extension neural network has a neural network like appearance. Weight vector resides between the input nodes and output nodes. Output nodes are the representation of input nodes by passing them through the weight vector.

There are total number of input and output nodes are represented by $$n$$ and $$n_c$$, respectively. These numbers depend on the number of characteristics and classes. Rather than using one weight value between two layer nodes as in neural network, extension neural network architecture has two weight values. In extension neural network architecture, for instance $$i$$, $$x^p_{ij}$$ is the input which belongs to class $$p$$ and $$o_{ik}$$ is the corresponding output for class $$k$$. The output $$o_{ik}$$ is calculated by using extension distance as shown in equation 6.

Estimated class is found through searching for the minimum extension distance among the calculated extension distance for all classes as summarized in equation 7, where $$k^*$$ is the estimated class.

Learning Algorithm
Each class is composed of ranges of characteristics. These characteristics are the input types or names which come from matter-element model. Weight values in extension neural network represent these ranges. In the learning algorithm, first weights are initialized by searching for the maximum and minimum values of inputs for each class as shown in equation 8

where, $$i$$ is the instance number and $$j$$ is represents number of input. This initialization provides classes' ranges according to given training data.

After maintaining weights, center of clusters are found through the equation 9.

Before learning process begins, predefined learning performance rate is given as shown in equation 10

where, $$N_m$$ is the misclassified instances and $$N_p$$ is the total number of instances. Initialized parameters are used to classify instances with using equation 6. If the initialization is not sufficient due to the learning performance rate, training is required. In the training step weights are adjusted to classify training data more accurately, therefore reducing learning performance rate is aimed. In each iteration, $$E_\tau$$ is checked to control if required learning performance is reached. In each iteration every training instance is used for training.

Instance $$i$$, belongs to class $$p$$ is shown by:

$$ X_{i}^p=\{x_{i1}^p,x_{i2}^p,...,x_{in}^p\} $$

$$ 1\leq p\leq n_c $$

Every input data point of $$X_i^p$$ is used in extension distance calculation to estimate the class of $$X_i^p$$. If the estimated class $$k^*=p$$ then update is not needed. Whereas, if $$k^* \neq p$$ then update is done. In update case, separators which show the relationship between inputs and classes, are shifted proportional to the distance between the center of clusters and the data points.

The update formula:

$$ z_{pj}^{new} = z_{pj}^{old} + \eta (x_{ij}^p-z_{pj}^{old}) $$

$$ z_{k^*j}^{new} = z_{k^*j}^{old} - \eta (x_{ij}^p-z_{k^*j}^{old}) $$

$$ w_{pj}^{L(new)} = w_{pj}^{L(old)} + \eta (x_{ij}^p-z_{pj}^{old}) $$

$$ w_{pj}^{U(new)} = w_{pj}^{U(old)} + \eta (x_{ij}^p-z_{pj}^{old}) $$

$$ w_{k^*j}^{L(new)} = w_{k^*j}^{L(old)} - \eta (x_{ij}^p-z_{k^*j}^{old}) $$

$$ w_{k^*j}^{U(new)} = w_{k^*j}^{U(old)} - \eta (x_{ij}^p-z_{k^*j}^{old}) $$

To classify the instance $$i$$ accurately, separator of class $$p$$ for input $$j$$ moves close to data-point of instance $$i$$, whereas separator of class $$k^*$$ for input $$j$$ moves far away. In the above image, an update example is given. Assume that instance $$i$$ belongs to class A, whereas it is classified to class B because extension distance calculation gives out $$ED_A>ED_B$$. After the update, separator of class A moves close to the data-point of instance $$i$$ whereas separator of class B moves far away. Consequently, extension distance gives out $$ED_B>ED_A$$, therefore after update instance $$i$$ is classified to class A.