Machine Learning 3

Jump to bottom

Prashant edited this page Feb 21, 2019 · 2 revisions

What is inductive reasoning machine learning?
The idea of inductive justification mainly aids in making right judgments based on the previously assembled pieces of evidence and data. Inductive reasoning operates mostly the entire function of analytical learning and is highly beneficial for taking accurate decisions and theoretical assumptions in complicated project works.
What is the use of Fourier Transform in Deep Learning?
The particular package is highly efficient for analyzing and managing and maintaining large databases. The software is infused with a high-quality feature called the spectral portrayal, and you can effectively utilize it to generate real-time array data. This is extremely helpful for processing all categories of signals.
What Is A Backpropagation?
Backpropagation is a training algorithm used for a multilayer neural networks. It moves the error information from the end of the network to all the weights inside the network and thus allows for efficient computation of the gradient.
The backpropagation algorithm can be divided into several steps:
Forward propagation of training data through the network in order to generate output.
Use target value and output value to compute error derivative with respect to output activations.
Backpropagate to compute the derivative of the error with respect to output activations in the previous layer and continue for all hidden layers.
Use the previously calculated derivatives for output and all hidden layers to calculate the error derivative with respect to weights.
Update the weights.
Explain The Following Three Variants Of Gradient Descent: Batch, Stochastic And Mini-batch?
Stochastic Gradient Descent:Uses only single training example to calculate the gradient and update parameters.
Batch Gradient Descent:Calculate the gradients for the whole dataset and perform just one update at each iteration.
Mini-batch Gradient Descent:Mini-batch gradient is a variation of stochastic gradient descent where instead of single training example, mini-batch of samples is used. It’s one of the most popular optimization algorithms.
Why Is Zero Initialization Not A Recommended Weight Initialization Technique? ⭐
As a result of setting weights in the network to zero, all the neurons at each layer are producing the same output and the same gradients during backpropagation.The network can’t learn at all because there is no source of asymmetry between neurons. That is why we need to add randomness to weight initialization process.
What Is The Role Of The Activation Function?
The goal of an activation function is to introduce nonlinearity into the neural network so that it can learn more complex function. Without it, the neural network would be only able to learn function which is a linear combination of its input data.
What Are Hyperparameters, Provide Some Examples? ⭐
Hyperparameters as opposed to model parameters can’t be learn from the data, they are set before training phase. Learning rate:It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it can overshoot the minima). It’s considered to be the most important hyperparameter.
Number of epochs:Epoch is defined as one forward pass and one backward pass of all training data.
Batch size:The number of training examples in one forward/backward pass.
What Is An Autoencoder?
Autoencoder is artificial neural networks able to learn representation for a set of data (encoding), without any supervision. The network learns by copying its input to the output, typically internal representation has smaller dimensions than input vector so that they can learn efficient ways of representing data. Autoencoder consist of two parts, an encoder tries to fit the inputs to an internal representation and decoder converts internal state to the outputs.
What Is A Dropout?
Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of these models share weights. It’s a form of model averaging.
What Is A Boltzmann Machine?
Boltzmann Machine is used to optimize the solution of a problem. The work of Boltzmann machine is basically to optimize the weights and the quantity for the given problem. Some important points about Boltzmann Machine −
It uses recurrent structure.
It consists of stochastic neurons, which consist one of the two possible states, either 1 or 0.
The neurons in this are either in adaptive (free state) or clamped (frozen state).
If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.
What Is An Auto-encoder?
An autoencoder is an autonomous Machine learning algorithm that uses backpropagation principle, where the target values are set to be equal to the inputs provided. Internally, it has a hidden layer that describes a code used to represent the input.
Some Key Facts about the autoencoder are as follows:-
It is an unsupervised ML algorithm similar to Principal Component Analysis
It minimizes the same objective function as Principal Component Analysis
It is a neural network
The neural network’s target output is its input
What are the differences between feedforward neural network and a recurrent neural network?
Feedforward network allows signals to travel one way only, from input to output. A recurrent neural network is a special network, which has unlike feedforward networks, recurrent connections. The RNN can be described using this recurrent formula: st=f(st−1,xt)
The state st at a time ‘t’ is a function of previous state st−1 and the input xt at the current time step, recurrent neural network maintains an internal state st, by using their own output as a part of the input for next time step. This state vector summarizes the history of the sequence it has seen so far. Recurrent neural networks are Turing complete, can simulate arbitrary programs. Whereas feedforward network can just compute one fixed-size input to one fixed-size output, the RNN can handle sequential data of arbitrary length.
What is a convolutional neural network?
Convolutional neural networks, also known as CNN, are a type of feedforward neural networks that use convolution in at least one of their layers. The convolutional layer consists of a set of filter (kernels). This filter is sliding across the entire input image, computing dot product between the weights of the filter and the input image. As a result of training, the network learns filters that can detect specific features.
Why are deep networks better than shallow ones?
Both shallow and deep networks are capable of approximating any function. For the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks are able to create deep representations, at every layer, the network learns a new, more abstract representation of the input.
When should one use Mean absolute error over Root mean square error as a performance measure for regression problems ?
When we have many outliers in the data, Mean absolute error is a better choice.
What are the three stages to build any model in Machine learning ?
There are 3 stages to build mode in machine learning. Those are
Model Building:- Choose the suitable algorithm for the model and train it according to the requirement of your problem.
Model Testing:- Check the accuracy of the model through the test data
Applying the model:- Make the required changes after testing and apply the final model which we have at the end.
How will you choose the most appropriate machine learning algorithm for your classification problem ?
If accuracy has to be given priority in deciding a machine learning algorithm, then the best way to go about it is to test a couple of different algorithms (try different parameters within each algorithm ) and choose the one that best meets the requirement. As a rule of thumb, choose a machine learning algorithm for your classification based on the size of your training set. If the training set is small, then using low variance/high bias classifiers like Naïve Bayes is beneficial, while in the case of large training sets high variance/low bias classifiers like k-nearest would serve the purpose best.
What is backpropagation in machine learning ?
A) The primary algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.
What is Bias-Variance trade-off in machine learning ?
Bias-Variance is a dilemma of minimizing the errors that stems from 2 different sources at a time. While Bias is based on preconceived assumptions in the learning algorithm, Variance measures whence a set of random numbers are spread across from their average value. Trading off in between these two aspects defines the process of machine algorithm.
What is deep learning ?
This might or might not apply to the job you’re going after, but your answer will help to show you know more than just the technical aspects of machine learning. Deep learning is a subset of machine learning. It refers to using multi-layered neural networks to process data in increasingly complex ways, enabling the software to train itself to perform tasks like speech and image recognition through exposure to these vast amounts of data. Thus the machine undergoes continual improvement in the ability to recognize and process information. Layers of neural networks stacked on top of each for use in deep learning are called deep neural networks.
You are given a dataset where the number of variables (p) is greater than the number of observations (n) (p>n). Which is the best technique to use and why ?
When the number of variables is greater than the number of observations, it represents a high dimensional dataset. In such cases, it is not possible to calculate a unique least square coefficient estimate. Penalized regression methods like LARS, Lasso or Ridge seem work well under these circumstances as they tend to shrink the coefficients to reduce variance. Whenever the least square estimates have higher variance, Ridge regression technique seems to work best.
If a highly positively skewed variable has missing values and we replace them with mean, do we underestimate or overestimate the values ?
Since in positively skewed data, mean in greater than median, we overestimate the value of missing observations.
What is bucketing in machine learning ?
Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins. Given temperature data sensitive to a tenth of a degree, all temperatures between 0.0 and 15.0 degrees could be put into one bin, 15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could be a third bin.
What is checkpoint in machine learning ?
Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.
What is Standardization and Normalisation? Give one advantage of each over the other ?
Both are feature scaling techniques.
Standardization is less affected by outliers as compared to Normalisation. Standardization doesn’t bound values to a specific range which may be a problem for some algorithms where an input is bounded between ranges. In statistics, Standardization is the subtraction of the mean and then dividing by its standard deviation. In Algebra, Normalization is the process of dividing of a vector by its length and it transforms your data into a range between 0 and 1.