Neural Networks are interesting for quite a lot of very different people:
In principle, NNs can compute any computable function, i.e., they can do everything a normal digital computer can do (Valiant, 1988; Siegelmann and Sontag, 1999; Orponen, 2000; Sima and Orponen, 2001), or perhaps even more, under some assumptions of doubtful practicality (see Siegelmann, 1998, but also Hadley, 1999).
Practical applications of NNs most often employ supervised learning. For supervised learning, you must provide training data that includes both the input and the desired result (the target value). After successful training, you can present input data alone to the NN (that is, input data without the desired result), and the NN will compute an output value that approximates the desired result. However, for training to be successful, you may need lots of training data and lots of computer time to do the training. In many applications, such as image and text processing, you will have to do a lot of work to select appropriate input data and to code the data as numeric values.
In practice, NNs are especially useful for classification and function approximation/mapping problems which are tolerant of some imprecision, which have lots of training data available, but to which hard and fast rules (such as those that might be used in an expert system) cannot easily be applied. Almost any finite-dimensional vector function on a compact set can be approximated to arbitrary precision by feedforward NNs (which are the type most often used in practical applications) if you have enough data and enough computing resources.
To be somewhat more precise, feedforward networks with a single hidden layer and trained by least-squares are statistically consistent estimators of arbitrary square-integrable regression functions under certain practically-satisfiable assumptions regarding sampling, target noise, number of hidden units, size of weights, and form of hidden-unit activation function (White, 1990). Such networks can also be trained as statistically consistent estimators of derivatives of regression functions (White and Gallant, 1992) and quantiles of the conditional noise distribution (White, 1992a). Feedforward networks with a single hidden layer using threshold or sigmoid activation functions are universally consistent estimators of binary classifications (Faragó and Lugosi, 1993; Lugosi and Zeger 1995; Devroye, Györfi, and Lugosi, 1996) under similar assumptions. Note that these results are stronger than the universal approximation theorems that merely show the existence of weights for arbitrarily accurate approximations, without demonstrating that such weights can be obtained by learning.
In standard backprop, too low a learning rate makes the network learn very slowly. Too high a learning rate makes the weights and objective function diverge, so there is no learning at all. If the objective function is quadratic, as in linear models, good learning rates can be computed from the Hessian matrix (Bertsekas and Tsitsiklis, 1996). If the objective function has many local and global optima, as in typical feedforward NNs with hidden units, the optimal learning rate often changes dramatically during the training process, since the Hessian also changes dramatically. Trying to train a NN using a constant learning rate is usually a tedious process requiring much trial and error.
It is simply a processor with many inputs and one output….It works in either the Training Mode or Using Mode. In the training mode, the neuron can be trained to fire (or not), for particular input patterns. In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not.
The major disadvantage is that they require large diversity of training for working in a real environment. Moreover, they are not strong enough to work in the real world.
How to count layers is a matter of considerable dispute.
To avoid ambiguity, you should speak of a 2-hidden-layer network, not a 4-layer network (as some would call it) or 3-layer network (as others would call it). And if the connections follow any pattern other than fully connecting each layer to the next and to no others, you should carefully specify the connections.
A vector of values presented at one time to all the input units of a neural network is called a "case", "example", "pattern, "sample", etc. The term "case" will be used in this FAQ because it is widely recognized, unambiguous, and requires less typing than the other terms. A case may include not only input values, but also target values and possibly other information.
A vector of values presented at different times to a single input unit is often called an "input variable" or "feature". To a statistician, it is a "predictor", "regressor", "covariate", "independent variable", "explanatory variable", etc. A vector of target values associated with a given output unit of the network during training will be called a "target variable" in this FAQ. To a statistician, it is usually a "response" or "dependent variable".
Simple difference is that the Artificial Neural Networks can learn by examples contrary to Normal Computers who perform the task on Algorithms. Although, the examples given to Artificial Neural Networks should be carefully chosen. Once properly “taught” Artificial Neural Networks can do on their own,,,or at least try to imitate..But that makes them so Unpredictable , which is opposite to that of algorithm based computers which we use in our daily life.
Teuvo Kohonen is one of the most famous and prolific researchers in neurocomputing, and he has invented a variety of networks. But many people refer to "Kohonen networks" without specifying which kind of Kohonen network, and this lack of precision can lead to confusion. The phrase "Kohonen network" most often refers to one of the following three types of networks:
It is rarely useful to have a NN simply memorize a set of data, since memorization can be done much more efficiently by numerous algorithms for table look-up. Typically, you want the NN to be able to perform accurately on new data, that is, to generalize.
There seems to be no term in the NN literature for the set of all cases that you want to be able to generalize to. Statistici call this set the "population". Tsypkin (1971) called it the "grand truth distribution," but this term has never caught on.
Neither is there a consistent term in the NN literature for the set of cases that are available for training and evaluating an NN. Statistici call this set the "sample". The sample is usually a subset of the population.
(Neurobiologists mean something entirely different by "population," apparently some collection of neurons, but I have never found out the exact meaning. I am going to continue to use "population" in the statistical sense until NN researchers reach a consensus on some other terms for "population" and "sample"; I suspect this will never happen.)
Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:
This is a two paradigm process-
There are many many kinds of NNs by now. Nobody knows exactly how many. New ones (or at least variations of old ones) are invented every week. Below is a collection of some of the most well known methods, not claiming to be complete.
The two main kinds of learning algorithms are supervised and unsupervised.
There are many ways to categorize learning methods. The distinctions are overlapping and can be confusing, and the terminology is used very inconsistently. This wer attempts to impose some order on the chaos, probably in vain.
Batch vs. Incremental Learning (also Instantaneous, Pattern, and Epoch)
Batch learning proceeds as follows:
Initialize the weights. Repeat the following steps: Process all the training data. Update the weights.
Incremental learning proceeds as follows:
Initialize the weights. Repeat the following steps: Process one training case. Update the weights.
In the above sketches, the exact meaning of "Process" and "Update" depends on the particular training algorithm and can be quite complicated for methods such as Levenberg-Marquardt Standard backprop (see What is backprop?) is quite simple, though. Batch standard backprop (without momentum) proceeds as follows:
Initialize the weights W. Repeat the following steps: Process all the training data DL to compute the gradient of the average error function AQ(DL,W). Update the weights by subtracting the gradient times the learning rate.
In simple words, a neural network is a connection of many very tiny processing elements called as neurons. There are two types of neural network-
Biological Neural Networks– These are made of real neurons.Those tiny CPU’s which you have got inside your brain..if u have..Not only brain,,but neurons actually make the whole nervous system.
Artificial Neural Networks– Artificial Neural Networks is an imitation of Biological Neural Networks,,by artificial designing small processing elements, in lieu of using digital computing systems that have only the binary digits. The Artificial Neural Networks are basically designed to make robots give the human quality efficiency to the work.
There is considerable overlap between the fields of neural networks and statistics. Statistics is concerned with data analysis. In neural network terminology, statistical inference me learning to generalize from noisy data. Some neural networks are not concerned with data analysis (e.g., those intended to model biological systems) and therefore have little to do with statistics. Some neural networks do not learn (e.g., Hopfield nets) and therefore have little to do with statistics. Some neural networks can learn successfully only from noise-free data (e.g., ART or the perceptron rule) and therefore would not be considered statistical methods. But most neural networks that can learn to generalize effectively from noisy data are similar or identical to statistical methods. For example:
The formula for the logistic activation function is often written as:
netoutput = 1 / (1+exp(-netinput));
But this formula can produce floating-point overflow in the exponential function if you program it in this simple form. To avoid overflow, you can do this:
if (netinput < -45) netoutput = 0; else if (netinput > 45) netoutput = 1; else netoutput = 1 / (1+exp(-netinput));
The constant 45 will work for double precision on all machines that I know of, but there may be some bizarre machines where it will require some adjustment. Other activation functions can be handled similarly.
Combination functions: Each non-input unit in a neural network combines values that are fed into it via synaptic connections from other units, producing a single value called the "net input". There is no standard term in the NN literature for the function that combines values. In this FAQ, it will be called the "combination function". The combination function is a vector-to scalar function. Most NNs use either a linear combination function (as in MLPs) or a Euclidean distance combination function (as in RBF networks). There is a detailed discussion of networks using these two kinds of combination function under "How do MLPs compare with RBFs?"
Activation functions: Most units in neural networks trform their net input by using a scalar-to-scalar function called an "activation function", yielding a value called the unit's "activation". Except possibly for output units, the activation value is fed via synpatic connections to one or more other units. The activation function is sometimes called a "trfer", and activation functions with a bounded range are often called "squashing" functions, such as the commonly used tanh (hyperbolic tangent) and logistic (1/(1+exp(-x)))) functions. If a unit does not trform its net input, it is said to have an "identity" or "linear" activation function. The reason for using non-identity activation functions is explained under "Why use activation functions?"
Error functions: Most methods for training supervised networks require a measure of the discrepancy between the networks output value and the target (desired output) value (even unsupervised networks may require such a measure of discrepancy.
It is weird at the same time amazing to know that we really do not know how we think. Biologically, neurons in human brain receive signals from host of fine structures called as dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitation input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.
"Backprop" is short for "backpropagation of error". The term backpropagation causes much confusion. Strictly speaking, backpropagation refers to the method for computing the gradient of the case-wise error function with respect to the weights for a feedforward network, a straightforward but elegant application of the chain rule of elementary calculus (Werbos 1974/1994). By extension, backpropagation or backprop refers to a training method that uses backpropagation to compute the gradient. By further extension, a backprop network is a feedforward network trained by backpropagation.
Yes of course…
Training a neural network is, in most cases, an exercise in numerical optimization of a usually nonlinear objective function ("objective function" me whatever function you are trying to optimize and is a slightly more general term than "error function" in that it may include other quantities such as penalties for weight decay;
Methods of nonlinear optimization have been studied for hundreds of years, and there is a huge literature on the subject in fields such as numerical analysis, operations research, and statistical computing, e.g., Bertsekas (1995), Bertsekas and Tsitsiklis (1996), Fletcher (1987), and Gill, Murray, and Wright (1981). Masters (1995) has a good elementary discussion of conjugate gradient and Levenberg-Marquardt algorithms in the context of NNs.
Mainly, Artificial Neural Networks OR Artificial Intelligence is designed to give robots human quality thinking. So that machines can decide “What if” and ”What if not” with precision. Some of the other advantages are:-
Numerical condition is one of the most fundamental and important concepts in numerical analysis. Numerical condition affects the speed and accuracy of most numerical algorithms. Numerical condition is especially important in the study of neural networks because ill-conditioning is a common cause of slow and inaccurate results from backprop-type algorithms.