User:Jimmy Novik/Nonlinear Mapping Function

In order to map x to linear models, we need to perform non-linear mapping on the inputs.

Options
1. Choose a very generic φ, like an infinite-dimensional that is implicitly used by kernel machines based on the RBF kernel. (Not good for advanced problems)

2. Manually engineer φ.

3. Use deep learning to learn φ. y=f(x;θ, w) =φ(x;θ)Tw

Example: Learning (approximating) XOR function
Goal Function


 * $$\mathbb{X}=\{[0,0]^\top,[0,1]^\top,[1,0]^\top,[1,1]^\top\}$$
 * $$f(\mathbb{X})=[0,1,1,0]^\top$$

If we choose a linear model with $$\theta$$ and $$b$$, then the model is defined as $$f(x;w,b)=x^{\top}w+b$$, which will output $$w=0$$ and $$b=\frac{1}{2}$$ when solved. This isn't very useful to have a function that outputs a constant value over all of the inputs.

If we add a hidden layer, such that $$h=f^{(1)}(x; W, c)$$ and $$y=f^{(2)}(h;w,b)$$ with the complete model being $$f(x;W,c,w,b)=f^{(2)}(f^{(1)}(x))$$

There must be a non-linear function to be able to turn our linear model into a non-linear. This is usually done by affine transformation followed by a fixed nonlinear activation function.

$$h=g(W^{\top}x+c)$$, where W provides the weight of a linear transformation and c the biases.

The default recommended activation function is rectified linear unit, or ReLU, defined as $$g(z)=max\{0,z\}$$

The complete network can now be specified as $$f(x; W, c, w, b)=w^{\top}max\{0,W^{\top}x+c\}+b$$