This article discusses a simple case of using a Neural Network to interpret an image, which contains a traffic light system, with the objective being to correctly identify if the light is red, yellow, or green. Here is a video link which goes over the basic running of the software. If you send a donation to my PayPalMe along with your email address and I’ll send you a copy of the codebase, which runs in Matlab or Octave.
The basic technical approach is as follows:
1. The Levenberg-Marquardt (LM) optimization algorithm is used for training of the Neural Networks – in this case, the excellent Pyrenn library (click here to access the site) provides the LM engine – their source code software is available in either Matlab or Python formats.
2. As with all Neural Network applications, it’s best to start with the very basics of a learning solution (the training and test sets) in order to see if the methodology is going to work. Thus basic traffic light images are drawn using PowerPoint.
3. Extract the R, G, and B matrices from each image, and then perform Singular Value Decomposition (SVD) on the R, G, and B matrices for each image. The SVD values are inputs to the Neural Network.
4. Assign values to the Neural Network outputs that map to “Yellow”, “Green”, and “Red”. In this case, an output value of -0.5 is assigned to “Yellow”, an output value of 0.0 is assigned to “Red”, and an output value of +0.5 is assigned to “Green”.
5. Train and test multiple Neural Networks on the processed images (using the top principal SVD values as inputs), and then select the best performer.
Following the first rule, the Neural Network will be trained and tested on images that depict PowerPoint-drawn traffic light systems – each training and test image displays a green, yellow, or red light in the normal corresponding locations on the traffic light box.
Eighteen sets of drawings of traffic lights were created – thus eighteen data sets to be used for training and test sets. Half of the images (9) were used for training of a Neural Network, and the other half (the remaining 9) were used for testing purposes. This may seem like a small data set, but it’s best to start small and only add more images to the data set if it is likely to increase the performance (it’s best to start with a minimal amount of data instead of too much data).
When performing image object recognition, it’s important to maximize the available image information while also using some kind of feature extraction, also called Principal Component Analysis, or PCA, to extract the key features, and thus minimize the inputs to the Neural Network.
In this case, two approaches were used to extract the important image information:
1) the R (Red), G (Green), and B (Blue) matrices were extracted from each image, and
2) Singular Value Decomposition, or SVD, was used as the PCA technique to draw out the key image features from each of the three extracted R, G, and B matrices, and the top SVD values from each of the R, G, and B matrices were used as inputs to the Neural Network.
One of the top-performing Neural Networks scored very well with low error values. It was able to correctly identify whether an image contained a red, yellow, or green light in all of the test images as shown below in Figure 1. Note that +0.5 on the Y-axis (the Neural Network output) represents a green light, 0.0 represents a red light, and -0.5 represents a yellow light.
The rest of the article discusses the technical path, start to finish.
The first step is to build the training and test images. A rectangle was constructed with a gray background, and circles were inserted in a similar manner to that of a traffic light. The circle would either be black or one of the three possible colors – red, yellow, or green, as shown below. Each training image contained the traffic light objects at different locations, within the image, and at different angles. Thus the objective was for the Neural Network to correctly interpret the light color despite the location of the traffic light in the image, and despite the angle of the traffic light. The training images are shown below in Figure 2.
The test images are shown below in Figure 3.
The process from feature extraction to input data set consists of the following steps:
1. Reduce the image size for all of the training and test images – in this case the reduced image was 0.3 the size of the original image. The reason for doing this is that it reduces the time to process the images without sacrificing important resolution – this is critical when doing real-time applications.
2. Extract the three R, G, and B matrices from each of the reduced images.
3. Perform the SVD on each of the R, G, and B matrices for each image.
4. Harvest the top 30 SVD values for each of the R, G, and B matrices and load these into an array to be used as input to a Neural Network.
The basic process is shown below in Figure 4.
The outputs are mapped such that “Yellow” corresponds to a -0.5 output value, “Red” corresponds to a to 0.0 output value, and “Green” corresponds to a +0.5 output value. Thus the Neural Network should output around 0.0 when observing a red light, a -0.5 when observing a yellow light, and +0.5 when observing a green light, as shown below in Figure 5.
In Figure 6, the Neural Network data structure layout is shown on the right side. The details are reviewed on the left side.
The code base layout is shown below in Figure 7. There are four primary functions:
1. loadData.m – this function is run first. It loads the image files, reduces the image sizes, extracts the R, G, and B matrices from each image, and performs SVD on each image.
2. buildTrnTstDataSets.m – this function is run next. It takes the arrays built in the first function and sorts them into input training and test arrays, and output training and test arrays.
3. runNet.m – this is run last. This is the function that will use the training data to build a Neural Network (using the LM optimization algorithm to generate the weights or gains), then it will test the Neural Network’s performance on the test input and output data.
4. monteCarlo.m – this function is used to train and test multiple Neural Networks. It is currently set to keep going until it has 20 Neural Networks that have passed the acceptable test criteria (those Neural Networks that don’t pass the test are discarded).
In summary, a Neural Network can be trained to recognize a red, yellow, or green light from PowerPoint images, with just a few training images. An example of a top performing Neural Network is shown below in Figure 8.
The next step would be to use real images of traffic lights, with the traffic lights close in to the image as was the case with the PowerPoint-drawn images. Spoiler Alert … it works with real traffic light images as well. That will be the next article.