What are
Neural Networks and Do They Work?
Neural Networks are a mystery.
No one understands how the brain works, so no one can possibly understand
Artificial Neural Networks, so we are told.
This is a
circular argument. If no one understands how the brain works, then how can we
know if artificial neural networks actually model the structure and function of
an actual brain? Perhaps the structure and function are understood, but no
one understands how the brain learns, thinks, or recalls information and past
experience?
Here is a
chart of all the different current neural networks that claim to model the
structure and function of the brain. They cannot all be correct.
What if an artificial neural
network actually modeled a biological neural network? What if it had left and
right hemispheres, the same as mammals, birds, and reptiles?
What if an artificial neural
network really had "Deep Learning"? What if it had thousands of
hidden layers? What if each layer had thousands and thousands of neurons? What
if each neuron was connected to thousands of other neurons?
What if neurons and layers were
not consecutive or recurrent, but concurrent, with the ability to compute
outputs in parallel in a single step?
What if an artificial neural
network could automatically build the entire structure of the neural network in
a single step, and figure out the number of hidden layers, the number of nodes
in each layer, and all the connection weights and offsets instantaneously?
What if an artificial neural
network could learn instantaneously, in a single step? What if it could learn
perfectly, with no overfitting or underfitting? What if it could learn the
entire training set with zero error, or find the best-fit line, curve, or
surface through scattered datapoints?
What if an artificial neural
network could think instantaneously, in a single step? What is thinking? It's
the ability to determine inputs that result in desired outputs, based on
weighted constraints on any combination of inputs and outputs. What about this?
What about that? That is thinking.
What if an artificial neural
network could learn and analyze training sets with hundreds of millions of
examples, each with hundreds of inputs and outputs? What if it could all work
on a single laptop with a single GPU?
What if an artificial neural
network could automatically detect and fill in unknown values? What if it could
automatically detect and correct outliers? What if it could automatically
detect and correct data entry errors? What if it could detect when a person
deceptively answered a question and fix it?
What if an artificial neural
network could automatically correct jitter caused by rounding or scatter in the
data? For example, if the age was recorded as 21, but actually it was 21.8?
What if an artificial neural
network could accurately interpolate and extrapolate? What if it could
automatically determine the minimum set of training examples required to
interpolate and extrapolate all the other training examples?
What if all this could be done
on a single laptop with a single GPU, in milliseconds? Unbelievable?
What if an artificial neural
network could predict the future? Predicting the future is tricky, because
there is no way to know if it is correct. Or is there? What if it could predict
both the future and the past? If it can predict the past accurately, perhaps
the future predictions can be trusted.
What if an artificial neural
network not only can predict the future and the past, but it can travel through
time into the future, then come back to the present and alert you how to change
your current and future behavior to avert some future disaster or prevent some
future disease?
What if an artificial neural
network could optimize process control, to make perfect welds or produce the
highest quality 3D printouts the first time, with no trial and error? What if
it could produce better engineering designs than all engineers combined could
produce in centuries?
Well, let me peer into my
crystal ball and see if this is possible, and when such amazing technology
might be available in the distant future. The fog is clearing, I see something
coming into focus! Oh no! It's too late! I developed this artificial neural
network 35 years ago. Well, not exactly. I developed it on the CPU 35 years
ago, but recently I have ported it to run on the GPU. To do that, I needed to
write massive amounts of code on the GPU with debugging capabilities. So, I had
to write a new computer language to support that.
What is this new GPU computer
language? I'm calling it GpuScript. It eliminates
CPU/GPU bottlenecks and can achieve unbelievable speeds: 23 peta-flops. A 4096
sample FFT in 3 nano-seconds. A 4096x4096 matrix multiply in 1.44 nano-seconds.
125 million voxel ray-marching in 1 ms. And, of
course, millisecond AI training, learning, thinking, data correction, and
optimization.
GEM AI / Neural Network
Development Story
I was so fascinated when I first
starting developing neural networks. I developed a neural network that could
self-organize. It could grow and shrink. It could add new layers or remove
existing layers. It could add or remove nodes in each layer. It could add or
remove recurrent links. It had accelerated training and went through a sleep
phase to escape local minima.
This neural network won national
awards and competitions, and was in numerous publications. It was used on a
robotic vision system for welding that could outperform experts and could
determine optimum welding parameters for welding plates up to 2 feet thick. It
correctly identified and classified millions of radar signals with no errors
and is still in use on fighter jets. After working just 3 months on my first
job, it replaced 80 PhDs from a research group and received a patent. It could
automatically control vehicles driving in coal mines in pitch darkness. So, why
did I abandon this neural network?
Suppose I heard a rumor 35 years
ago that someone had figured out how to directly solve the link weights and
offsets for the links in a neural network. What would my reaction have been?
Ridiculous! Only linear systems can be solved in closed form. Neural networks
are highly non-linear and require some sort of iteration, such as
back-propagation.
So what if the neural network
link values can be solved. The number of layers, the number of nodes in each
layer, and which nodes are connected by those links are still unknown. Are the
links consecutive or recurrent? What type of basis
functions should be used for each node? There are so many unknowns.
What about the data
representation and preprocessing? Is there a direct closed-form solution for
that? Does the solution automatically figure out how to reduce dimensionality
and eliminate input correlations?
What about generalization? Even
if the links could be solved to give zero error at the training examples, how
would the neural network perform for interpolation and extrapolation of these
training points?
What if the training points
contained errors and scatter? Would the solution still go through every
training example with zero error and have wild oscillations? Or, would the
solution somehow fit a best-fit line, curve, or surface through these scattered
points? There is no way that I would have believed someone could solve a neural
network without training.
What if solving a neural network
was not just a rumor? What if I actually figured out how to solve all of these
issues: the number of layers, the number of nodes in each layer, the connection
weights and offsets between nodes, whether the links were recurrent or not, and
the node basis functions.
What if the solution could
exactly pass through each training point with zero error, or it could find the
best-fit surface through scattered points? What if the solution resulted
in perfect interpolation and extrapolation? What if no data preprocessing
was necessary? What if it could find solutions regardless of dimensionality and
input correlations?
If I did find this solution,
could I convince anyone else that not only was solving a neural network
possible, but I was able to accomplish it? The answer was no. No matter how
many different types of training set examples I generated showing perfect results,
no one would believe it.
Their response? A biological
neural network takes time to learn. Occasionally it makes mistakes. It is
imperfect and imprecise. It rarely if ever can find exact solutions. I was told
that I was not doing AI, and I was not modeling the brain.
No matter. I continued to use
this new neural network on other projects in engineering. The results were
amazing. It could make engineering designs that met all safety requirements and
were far more cost effective than all human experts combined could develop in
hundreds of years. It could make engineers 100 times more productive and allow
them to build two to three times more for the same price.
What was the response to this
neural network? At first engineers were excited, but then they became terrified
and begged me not to release it. It would eliminate high-level jobs. Since
designs only required half the steel and concrete, I was told that material
suppliers would track me down.
Is civilization ready for “real”
AI? This neural network is a double-edged sword. Perhaps somewhere there is a
chivalrous knight worthy to lift the sword from the stone, one filled with
wisdom, courage and vision. Until then, we wait.
What is True AI?
The question shouldn’t be, “Is
it AI?”. The question should be, “What can it do?” If something can instantaneously
produce better solutions than all men combined, who cares if it is called AI,
neural networks, machine learning, mathematics, a basis transform,
multi-dimensional interpolation, numerical modeling, or whatever.
But if a package or system can’t
do much, it’s better to call it AI. That’s a sure way to add magic and mystery
to the equation. Since a neural network is beyond our understanding,
everyone with deep pockets, including investors, corporations, and governments,
will jump on the bandwagon. Strange, how claiming to be ignorant and making
extravagant promises attracts investments like flies to honey.
What Can GEM AI Do?
Neural Network Terminology: Instantaneous
Training, Perfect Generalization, Instantaneous Thinking, Data Correction,
Optimization, and Iterative Learning. This terminology is rarely applied to
neural networks. Why? Perhaps this is the first actual neural network ever
developed, or perhaps this is so far above neural networks that it deserves a
new name: Geometric Empirical Modeling Artificial Neural-Network Intelligence
(GEM-ANNI), or GEM for short.
Instantaneous means a single GPU
function call of Order(1). Instantaneous does not mean iterating through
each training example once. Instantaneous means presenting all the training
examples to the neural network all at once. Instantaneous means about one
millisecond on a single laptop with a single GPU. Relative to other neural
network approaches, Instantaneous means Instantaneous.
Instantaneous Training means instantaneously
determining and building all the layers, all the nodes in each layer, and all
the node connection links with weights and offsets for the entire neural
network. No trial and error to determine the number of layers or nodes. No
iterative back-propagation to estimate weights and offsets. Instantaneous
Training means Instantaneous Training.
Perfect Generalization means
obtaining the perfect solution for interpolating and extrapolating the training
set and ignoring inputs with useless or random information content. How can a
solution be perfect? Is the shortest distance between 2 points a straight line?
Is that perfect? Is the solution to a non-singular matrix equation perfect? Is
the best-fit line though a set of scattered points perfect? Is a Fourier
transform of a signal to find the amplitude and phase of each frequency
perfect? Yes, yes, and yes. Perfect Generalization means Perfect
Generalization.
Instantaneous Thinking means the
ability to instantaneously solve for any inputs given constraints on both
inputs and outputs. Thinking is the inverse of learning. Learning means given
x, solve for y. Thinking means solve for the smallest or biggest x that results
in a desired y. Thinking means finding the most inexpensive engineering design
that passes all safety requirements. Thinking means determining the optimum
welding or 3D printing parameters that give the strongest and best results. Instantaneous
Thinking means finding not just a good answer, but the very best answer, in a
single GPU call. Instantaneous Thinking means Instantaneous Thinking.
Data Correction means filling in
unknowns, detecting and correcting outliers, and fixing jitter caused by
rounding, noise, scatter, or other random errors in the data. Errors in the
data can and will skew results and degrade performance or prediction accuracy.
Outlier detection and correction means lie detection. Garbage in, garbage
out. Take out the garbage, clean up the mess, and get back to perfection.
Optimization means determining
the minimum set of training examples required to interpolate or extrapolate all
the other training examples. For example, suppose a training set consists of a
million points all distributed on a straight line. If the line is horizontal,
optimization only needs to select a single point. Otherwise, optimization only
needs to select two points. Optimization results in small and memory efficient
neural networks that can learn training sets of practically unlimited size.
Iterative Learning means
improving performance and building experience step-by-step. This is how the
scientific method works. First, collect some observations, such as training
examples. Next, form a hypothesis, such as using thinking to determine a guess
for the best solution and predicting the result. Then test the hypothesis by
running an experiment based on the hypothesis prediction. If the experimental
result does not match the hypothesis prediction, then add a new training sample
based on the experiment. Repeat until the prediction matches the experimental
result. Iterative Learning takes an imperfect and incomplete training set and
makes it perfect and complete. For example, Iterative Learning can observe a
person operating a vehicle or other equipment, then use inputs such as video,
lidar, and other high-level commands to predict what the person will do next.
After correcting outliers and using additional reinforcement corrections, the
neural network can soon operate a vehicle or other equipment better than the
best human operator.
This list is not exhaustive.
This GEM neural network accounts for correlated inputs, works well even with
increased dimensionality, and works perfectly for both simple and complex
problems. This technology is a paradigm shift and will replace AI, neural networks,
machine learning, statistics, linear algebra, predictive analytics, process
control, and more. It’s hard to beat instantaneous and perfect.
Concluding Thoughts
Statistics is commonly used to
support all types of claims. Are eggs healthy or not? Some studies conclude
definitively that eggs are healthy, while other studies with a small measure of
uncertainty positively determine that eggs are very unhealthy. Actually,
statistics is limited in scope and very fragile. Breaking one of a long list of
assumptions and constraints, such as linear relationships, uncorrelated inputs,
homogenous variance, and normal distribution, makes statistics and linear
regression unfit for almost any real-world application.
Is there any solution to this
dilemma? Perhaps. AI and neural networks have been working for decades to
overcome these limitations in statistics, with only an occasional crude success
now and then. Many people would be surprised to know that training a neural
network to fit a line through 2 points requires hundreds of AI experts working
for months on nuclear-powered supercomputers to achieve only an approximate
solution. Think this is an exaggeration? AI training sets are usually so simple
that linear regression often results in better solutions and predictions
than neural networks, causing many people to conclude that linear regression is
a form of AI.
Neural networks require so much
time for trial and error and training, that no one would ever consider using a
neural network to perform linear regression. If neural networks don’t work on
simple problems, why should they be trusted to solve complex problems?
For example, suppose there was
an AI competition that had a simple training set: input a zero, get a zero out.
Input a one, get a one out (0=>0) (1=>1). Suppose someone submitted a
neural network that results in the following test output: (0=>0)
(0.25=>0.25) (0.5=>0.5) (0.75=>0.75) (1=>1). Sorry, you lose. The
test set secretly established by the competition was: (0=>0) (0.25=>1.82)
(0.5=>0.5) (0.75=>-0.2) (1=>1). Since neural networks have such
unpredictable behavior, odds are that out of thousands of submissions, someone
will get very close to the competition test set and be declared the winner,
even though by all other standards they achieved a horrible result.
What is needed is a new type of
mathematics, or a new type of AI neural network. Geometric Empirical Modeling
(GEM) is a neural network that gives the same solution for linear datasets as
linear regression, but instantaneously gives a better solution for non-linear
datasets than the best AI technique or neural network ever developed.
So, why is this neural network
only “perhaps” a solution for the dilemma in statistics? This technology has
never been released, and may never be released. Imagine if there was a way to
definitively determine if eggs are healthy or not, not just for the general
public, but for you personally? What is the optimum number of eggs per day or
week that you should eat, and how much would this affect your lifespan or
mitigate future disease? Imagine if statistics could no longer be used to come
to any arbitrary conclusion, but anyone, even without training, could enter
data about any topic and discover the truth? Is finding out the truth in the
modern era really that terrifying? According to those with power and money,
yes.