Top 16 Deep Learning Interview Questions You Must Prepare 19.Mar.2024

An autoencoder is an autonomous Machine learning algorithm that uses backpropagation principle, where the target values are set to be equal to the inputs provided. Internally, it has a hidden layer that describes a code used to represent the input.

Some Key Facts about the autoencoder are as follows:-

  • It is an unsupervised ML algorithm similar to Principal Component Analysis
  • It minimizes the same objective function as Principal Component Analysis
  • It is a neural network
  • The neural network’s target output is its input

Weight initialization is a very important step. Bad weight initialization can prevent a network from learning. Good initialization can lead to quicker convergence and better overall error. Biases can be generally initialized to zero. The general rule for setting the weights is to be close to zero without being too small.

Ability to approximate any given function. The higher model capacity is the larger amount of information that can be stored in the network.

  1. Computationally efficient compared to stochastic gradient descent.
  2. Improve generalization by finding flat minima.
  3. Improving convergence, by using mini-batches we approximating the gradient of the entire training set, which might help to avoid local minima.

Hyperparameters as opposed to model parameters can’t be learn from the data, they are set before training phase.

Learning rate:

It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it can overshoot the minima). It’s considered to be the most important hyperparameter.

Number of epochs:

Epoch is defined as one forward pass and one backward pass of all training data.

Batch size:

The number of training examples in one forward/backward pass.

Stochastic Gradient Descent:

Uses only single training example to calculate the gradient and update parameters.

Batch Gradient Descent:

Calculate the gradients for the whole dataset and perform just one update at each iteration.

Mini-batch Gradient Descent:

Mini-batch gradient is a variation of stochastic gradient descent where instead of single training example, mini-batch of samples is used. It’s one of the most popular optimization algorithms. 

Data normalization is very important preprocessing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation.

Autoencoder is artificial neural networks able to learn representation for a set of data (encoding), without any supervision. The network learns by copying its input to the output, typically internal representation has smaller dimensions than input vector so that they can learn efficient ways of representing data. Autoencoder consist of two parts, an encoder tries to fit the inputs to an internal representation and decoder converts internal state to the outputs.

Boltzmann Machine is used to optimize the solution of a problem. The work of Boltzmann machine is basically to optimize the weights and the quantity for the given problem.

Some important points about Boltzmann Machine −

  • It uses recurrent structure.
  • It consists of stochastic neurons, which consist one of the two possible states, either 1 or @
  • The neurons in this are either in adaptive (free state) or clamped (frozen state).
  • If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.

Weight initialization is one of the very important steps. A bad weight initialization can prevent a network from learning but good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be generally initialized to zero. The rule for setting the weights is to be close to zero without being too small.

Backpropagation is a training algorithm used for a multilayer neural networks. It moves the error information from the end of the network to all the weights inside the network and thus allows for efficient computation of the gradient.

The backpropagation algorithm can be divided into several steps:

  1. Forward propagation of training data through the network in order to generate output.
  2. Use target value and output value to compute error derivative with respect to output activations.
  3. Backpropagate to compute the derivative of the error with respect to output activations in the previous layer and continue for all hidden layers.
  4. Use the previously calculated derivatives for output and all hidden layers to calculate the error derivative with respect to weights.
  5. Update the weights.

Yes, this can be done considering that layer 4 output is from previous time step like in RNN. Also, we need to assume that previous input batch is sometimes- correlated with current batch.

Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of these models share weights. It’s a form of model averaging.

The goal of an activation function is to introduce nonlinearity into the neural network so that it can learn more complex function. Without it, the neural network would be only able to learn function which is a linear combination of its input data.

Both shallow and deep networks are capable of approximating any function. For the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks are able to create deep representations, at every layer, the network learns a new, more abstract representation of the input.

As a result of setting weights in the network to zero, all the neurons at each layer are producing the same output and the same gradients during backpropagation.

The network can’t learn at all because there is no source of asymmetry between neurons. That is why we need to add randomness to weight initialization process.