User:Esraiak/sandbox

Simple example
A convolutional neural net is a feed-forward neural net that has convolutional layers.

A convolutional layer takes as input a M x N matrix (could for example be the redness of each pixel of an input image) and provides as output another M' x N' matrix. Oftentimes convolutional layers are placed in R parallel channels, and such stacks of convolutional layers are sometimes also called (M x N x R) convolutional layers. For clarity, let's continue with the R=1 case.

Suppose we want to train a network to recognize features from a 13x13 pixel grayscale image (so the image is a real 13x13 matrix). It is reasonable to create a first layer with neurons that connect to small connected patches, since we expect these neurons to learn to recognize "local features" (like lines or blots).

Viterbi
Given a sequence $$(y_1, \dots, y_r)$$ of observations, emission probabilities p(x,y) of observing $$y$$ when the hidden state is $$x$$, and transition probabilities q(x, x') between hidden states, find the most likely path $$(x_1, \dots, x_r)$$ of hidden states.

The algorithm
Let $$N$$ be a neural network with $$e$$ edges.

Below, $$x,x_1,x_2,\dots$$ will denote vectors in $$\mathbb{R}^m$$, $$y,y',y_1,y_2,\dots$$ vectors in $$\mathbb{R}^n$$, and $$w, w_0, w_1, \ldots$$ vectors in $$\mathbb{R}^e$$. These are called inputs, outputs and weights respectively. The neural network gives a function $$y = f_N(w, x)$$ which, given a weight $$w$$, maps an input $$x$$ to an output $$y$$.

We select an error function $$E(y,y')$$ measuring the difference between two outputs. The standard choice is $$E(y,y') = |y-y'|^2$$, the Euclidean distance between the vectors $$y$$ and $$y'$$.

The backpropagation algorithm takes as input a sequence of training examples $$(x_1,y_1), \dots, (x_p, y_p)$$ and produces a sequence of weights $$w_0, w_1, \dots, w_p$$ starting from some initial weight $$w_0$$, usually chosen to be random. These weights are computed in turn: we compute $$w_i$$ using only $$(x_i, y_i, w_{i-1})$$ for $$i = 1, \dots, p$$. The output of the backpropagation algorithm is then $$w_p$$, giving us a new function $$f_N(w_p, \cdot)$$. The computation is the same in each step, so we describe only the case $$i = 1$$.

Now we describe how to find $$w_1$$ from $$(x_1, y_1, w_0)$$. This is done by considering a variable weight $$w$$ apply gradient descent to the function $$w\mapsto E(f_N(w, x_1), y_1)$$ to find a local minimum, starting at $$w_0$$. We then let $$w_1$$ be the minimizing weight found by gradient descent.

The algorithm in coordinates
Aspirin contains acetylsalicylic acid (ASA)

ASA enters the blood stream

By entangling of the molecules (picture)


 * White blood cells produce prostaglandins at wounded cells (=tissue)