Let’s start with the example of students in a medical school. There are 1,000 students and the top 100 students (the top 10%) are getting straight A’s because they are bright and they have studied diligently. Would you say that all of these 100 students (the top 10 percenters) will do equally well out in the real world since they all scored similar grades in medical school? Would you say that all 100 students will become world-renowned brain and heart surgeons that create modern and ground-breaking methods of surgery?
No – you intuitively know that only a handful of the 100 students will “set the world on fire” while the rest of the 100 students do good jobs as doctors and surgeons but they don’t do anything Earth-shattering.
Why is that? After all they all scored the same in all of their tests so we’d assume that they’d all set the world on fire, correct? Well of course not because we know that despite their having scored similarly in their tests (A’s), their brains are wired differently and some of those brains are particularly suited for being creative and innovative in the real world. But we can’t measure that capacity in the university (med school) with tests – we don’t find out until “the rubber hits the road” and each of these graduated students goes out into the real world and begins tackling challenges in their fields.
Well the same applies to Neural Networks … First you train many Neural Networks and only select the top performers (each takes the same series of tests – just as with the students in med school). So out of 1,000 trained Neural Networks, perhaps only 10% (100) of them score above a specified threshold. Do you assume that all of these Neural Networks will perform equally as well in “the real world” – the full application domain space of your application? No – you must test them in this application domain space, and … just a handful of Neural Networks will be the “renowned brain and heart surgeons” (speaking figuratively of course) and the remaining Neural Networks out of the original 100 will perform in an average way.
These super high-performing Neural Networks (the “brilliant and innovative brain surgeons”) are called Super Nets. These are the ones that demonstrate the blistering performance outside of the original training regime but you won’t discover them until you fully test the 10% high achieving Neural Networks in the full application domain space.
In the video below, Neural Networks are being trained (using Matlab’s extremely fast Levenberg-Marquardt optimization algorithm) for stock market prediction purposes (specifically to predict a company’s future stock performance based on its previous history – one could call it Neural Network Technical Analysis). The Neural Networks that achieve a prediction ROI (Return On Investment) of greater than 50% are saved as part of the high-performer group (similar to the 10% of the med school student group). Thus when you see the “SUCCESSUL!!” text, this is high-achiever Neural Network that has done very well with a test set of companies.
However, the real test is when these high-achieving Neural Networks are tested against a 10 year rolling forecast data set – that is they must make predictions for each year of a 10 year time span. Those that score the highest ROI with the lowest standard deviation (and there are just a few) are the Super Nets – the Super Star performers.
Autonomous Driving Application
With this application, many Neural Networks would be trained on the sensor inputs (many different images), with the outputs being the appropriate driving commands. The Neural Networks which surpassed a specified threshold of correctly issuing the correct commands would be saved. So for simplicity we’ll say that 1,000 Neural Networks were trained but only 100 Neural Networks scored above the specified threshold.
The next step is to test those high-scoring 100 Neural Networks on the open road in the autonomous vehicle and each Neural Network is tested and scored on performance. From the road tests of these 100 Neural Networks the top 10 performing Neural Networks are derived. These top 10 Neural Networks are considered to be the “final product” – they are the Super Nets which will be performing the autonomous control of the vehicle.
These Super Nets will form a team such that a “consensus solution” is used – all 10 Super Nets are constantly processing the road images and issuing correction commands. However, the solution is taken from the consensus – so, for example, if 7 out of 10 Super Nets agree that a “slow down, turn right” command should be issued, then that is the one selected.
For most cases, we can assume that all 10 Super Nets will issue the same command or set of commands. However, for cases where there are ambiguities (i.e. a situation is encountered for which they were not trained – maybe a tilted road with fog at night, etc.), the teaming approach will produce a good solution since it is by consensus.