In the following notation, x is the input, y is the encoded data, z is the decoded data, σ is a nonlinear activation function (sigmoid or hyperbolic tangent, usually), and f(x;θ) means a function of x parameterized by θ.
The model can be summarized in the following way:
The input data is mapped to the hidden layer (encoding). The mapping is usually an affine (allowing for or preserving parallel relationships.) transformation followed by a non-linearity:
y = f(x;θ) = σ(Wx+b)y = f(x;θ) =σ(Wx+b)
The hidden layer is mapped to the output layer, which is also called decoding. The mapping is an affine transformation (affine transformation is a linear mapping method that preserves points, straight lines, and planes) optionally ...