User:Jeblad/Standard notation (neural net)

Standard notation as it is used within deep learning, has changed a lot since the first published works. It is undergoing some standardization, but mostly at an informal level.

Indexes

 * training : Superscript $$\left ( i \right )$$ like $$\mathbf{x}^{\left ( i \right )}$$ denotes the iᵗʰ training example in a trainingset
 * layer : Superscript $$\left [ l \right ]$$ like $$\mathbf{x}^{\left [ l \right ]}$$ denotes the lᵗʰ layer in a set of layers
 * sequence : Superscript $$\left \langle t \right \rangle$$ like $$\mathbf{x}^{\left \langle t \right \rangle}$$ denotes the tᵗʰ item in a sequence of items
 * 1D node : Subscript $$i$$ like $$x_{i}$$ denotes the iᵗʰ node in a one-dimensional layer
 * 2D node : Subscript $$ij$$ or $$i,j$$ like $$x_{ij}$$ or $$x_{i,j}$$ denotes the node at iᵗʰ row and jᵗʰ column in a two-dimensional layer
 * 1D weight : Subscript $$ij$$ or $$i,j$$ like $$w_{ij}$$ or $$w_{i,j}$$ denotes the weight between node iᵗʰ at previous layer and jᵗʰ at following layer

Sizes

 * number of samples : $$m$$ is the number of samples in the dataset
 * input size : $$n_x$$ is the (possibly multidimensional) size of input $$x$$ (or number of features)
 * output size : $$n_y$$ is the (possibly multidimensional) size of output $$y$$ (or number of classes)
 * hidden units : $$n_h^{\left [ l \right ]}$$ is the number of units in hidden layer $$\left [ l \right ]$$
 * number of layers : $$L$$ is the number of layers in the network
 * input sequence size : $$T_x$$ is the size of the input sequence
 * output sequence size : $$T_y$$ is the size of the output sequence
 * input training sequence size : $$T_x^{\left ( i \right )}$$ is the size of the input training sequence $$\left ( i \right )$$ (each sample training sequence)
 * output training sequence size : $$T_y^{\left ( i \right )}$$ is the size of the output training sequence $$\left ( i \right )$$ (each sample training sequence)

Other

 * cross entropy : $$H(p,q) = -\sum_{x \in \mathcal{X}} p(x)\, \log q(x)$$
 * elementwise sequence loss : and by using cross entropy  that is the sum would be over  for classification in and out of a single class