An autoencoder is an autonomous Machine learning algorithm that uses backpropagation principle, where the target values are set to be equal to the inputs provided. Internally, it has a hidden layer that describes a code used to represent the input.

**Some Key Facts about the autoencoder are as follows:-**

- It is an unsupervised ML algorithm similar to Principal Component Analysis
- It minimizes the same objective function as Principal Component Analysis
- It is a neural network
- The neural network’s target output is its input

- Computationally efficient compared to stochastic gradient descent.
- Improve generalization by finding flat minima.
- Improving convergence, by using mini-batches we approximating the gradient of the entire training set, which might help to avoid local minima.

Hyperparameters as opposed to model parameters can’t be learn from the data, they are set before training phase.

**Learning rate:**

It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it can overshoot the minima). It’s considered to be the most important hyperparameter.

**Number of epochs:**

Epoch is defined as one forward pass and one backward pass of all training data.

**Batch size:**

The number of training examples in one forward/backward pass.

**Stochastic Gradient Descent:**

Uses only single training example to calculate the gradient and update parameters.

**Batch Gradient Descent:**

Calculate the gradients for the whole dataset and perform just one update at each iteration.

**Mini-batch Gradient Descent:**

Mini-batch gradient is a variation of stochastic gradient descent where instead of single training example, mini-batch of samples is used. It’s one of the most popular optimization algorithms.

Boltzmann Machine is used to optimize the solution of a problem. The work of Boltzmann machine is basically to optimize the weights and the quantity for the given problem.

**Some important points about Boltzmann Machine −**

- It uses recurrent structure.
- It consists of stochastic neurons, which consist one of the two possible states, either 1 or @
- The neurons in this are either in adaptive (free state) or clamped (frozen state).
- If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.

Backpropagation is a training algorithm used for a multilayer neural networks. It moves the error information from the end of the network to all the weights inside the network and thus allows for efficient computation of the gradient.

**The backpropagation algorithm can be divided into several steps:**

- Forward propagation of training data through the network in order to generate output.
- Use target value and output value to compute error derivative with respect to output activations.
- Backpropagate to compute the derivative of the error with respect to output activations in the previous layer and continue for all hidden layers.
- Use the previously calculated derivatives for output and all hidden layers to calculate the error derivative with respect to weights.
- Update the weights.

As a result of setting weights in the network to zero, all the neurons at each layer are producing the same output and the same gradients during backpropagation.

The network can’t learn at all because there is no source of asymmetry between neurons. That is why we need to add randomness to weight initialization process.