Kernel adaptive filter

In signal processing, a  kernel adaptive filter is a type of nonlinear adaptive filter. An adaptive filter is a filter that adapts its transfer function to changes in signal properties over time by minimizing an error or loss function that characterizes how far the filter deviates from ideal behavior. The adaptation process is based on learning from a sequence of signal samples and is thus an online algorithm. A nonlinear adaptive filter is one in which the transfer function is nonlinear.

Kernel adaptive filters implement a nonlinear transfer function using kernel methods. In these methods, the signal is mapped to a high-dimensional linear feature space and a nonlinear function is approximated as a sum over kernels, whose domain is the feature space. If this is done in a reproducing kernel Hilbert space, a kernel method can be a universal approximator for a nonlinear function. Kernel methods have the advantage of having convex loss functions, with no local minima, and of being only moderately complex to implement.

Because high-dimensional feature space is linear, kernel adaptive filters can be thought of as a generalization of linear adaptive filters. As with linear adaptive filters, there are two general approaches to adapting a filter: the least mean squares filter (LMS) and the recursive least squares filter (RLS).

Self organising kernel adaptive filters that use iteration to achieve convex LMS error minimisation address some of the statistical and practical issues of non-linear models that do not arise in the linear case. Regularisation is particularly important feature for non-linear models and also often used in linear adaptive filters to reduce statistical uncertainties. However because nonlinear filters typically have a much higher potential structural complexity (or higher dimensional feature space) compared to the subspace actually required, regularisation of some kind must deal with the under-determined model. Though some specific forms of parameter regularisation such as prescribed by Vapink's SRM & SVM address the dimensionality problem statistically to some extent, there remain further statistical and practical issues for truly adaptive non-linear filters. Adaptive filters are often used for tracking the behaviour of a time-varying system or systems which cannot be fully modelled from the data and structure available, hence the models may not only need to adapt parameters, but structure too.

Where structural parameters of kernels are derived directly from data being processed (as in the above "Support Vector" approach) there are convenient opportunities for analytically robust methods of self organisation of the kernels available to the filter. The linearised feature space induced by kernels allows linear projection of new samples on to the current structure of the model where novelty in new data can be easily differentiated from noise-born errors which should not result in a change to model structure. Analytical metrics for structure analysis can be used to parsimoniously grow model complexity when required or optimally prune the existing structure when processor resource limits are reached. Structure updates are also relevant when system variation is detected and the long-term memory of the model should be updated as for the Kalman Filter case in linear filters.

Iterative gradient descent that is typically used in adaptive filters has also gained popularity in offline batch-mode support vector based machine learning because of its computational efficiency for large data set processing. Both time series and batch data processing performance is reported to be able to easily handle over 100,000 training examples using as little as 10kB RAM. Data sizes this large are challenging to the original formulations of support vector machines and other kernel methods, which for example relied on constrained optimisation using linear or quadratic programming techniques.