Opening our Eyes with Convolutional Neural Networks
Vision is one of the most important gifts in the world. It’s what you use to get around, watch beautiful sunsets and read these amazing articles. There is no way we as a human race could have made as many advancements without sight. But the fact is, we are partially blind. What!?
To many people, the image below may just be an x-ray of the lung. To most doctors, they will recognize a lung affected by pneumonia, but not all. In fact, 12 million people are misdiagnosed in America alone. And in hospitals, diagnosis errors contribute to 1 in 10 deaths. That’s crazy! One area where misdiagnosis is common is in looking and reading scans like x-rays, MRI’s and CT scans.
What about the 1.25 million people who die in car crashes worldwide. The point is, we really don’t have perfect vision. So the solution is to find something that makes very few mistakes. It won’t get tired, do stupid things like getting drunk and drive or read something the wrong way. But, it’s not like any other species that can do that. Wrong! Enter computer vision.
Convolutional Neural Networks
Using AI and machine learning we are now able to teach machines to do specific objectives and make predictions. The most well-known approach is through neural networks which are meant to simulate the neurons in our brain. Convolutional neural networks (CNN) are used specifically for image classification. Let’s break down how they work.
The image above is a breakdown of how the layers in a neural network would function. First, we have an input image, in this case, a car which is passed through a convolution layer that breaks down the image. By focusing in on the image, we can find key features (given the name feature learning). These features may start off as edges, then after adding another layer the edges become shapes, and finally, those shapes are identified to make a prediction.
In this case, the network starts by recognizing circular edges, then it combines those edges to make a circle or wheel, and based on that information along with other inputs, the network can predict the image is a car. Other inputs are shapes identified from other parts of the images. But you can add weights to influence the network’s prediction(this article tells you more about neural networks and weights). For example, when separating animals and vehicles, wheels are a good way to differentiate the two.
How do you Recognize these Features?
Pixels in images are made up of an RGB value, which stands for red, green and blue which when combined can form any colour. When dealing with images in CNN’s we will convert images to greyscale, which is one number between 0 and 255. We will then apply a kernel filter which will locate the edges of an image.
Let’s take this traffic light for example. We start by focusing on an edge with a 3 by 3-pixel grid, each with a greyscale value. Then we multiply those pixel values by a kernel comprised of 0, 1, and -1’s. Finally, we will add up the product, if the product is large it will be an edge whereas if the product is small, it is not an important feature. This makes sense since if you zoomed in on just the traffic light, the greyscale values would be similar (for example 46) and when multiplied and added ( 46+-46=0) the value would be low.
CNN’s in Action
In my project, I decided to use CNN’s to classify images from the CIFAR-10 data set which includes 10 different objects like horses, ships and other animals and vehicles. I decided to use a network with many different types and subjects in images since many CNN’s today are binary. Meaning CNN’s are only looking for one thing. If your only looking for a patient affected by pneumonia, you may not be able to tell if they have lung cancer or flail chest. That sucks! Classifying 10 different subjects is a start but what if one happens to have that 11th disease. Anyways, here is the network.
Convolutional layers break down the features in the image.
Bach Normalization helps improve the speed and performance of the network by normalizing the inputs
Max Pooling takes the most important features (largest values) from a group of 2 by 2 pixels in this case
Dropout helps to reduce overfitting by randomly changing inputs to 0
The rest of the model is comprised of more convolutional layers along with some dense layers to help make predictions based on the image. After testing my model achieved an accuracy of 83%.
So What?
Today CNNs are being used in autonomous vehicles along with lasers to classify object around them. There are many applications in healthcare imaging, but most CNNs only classify “does it have this disease or not” and therefore still has some future improvements. Microsoft AI is constantly capturing photos of habitats of endangered species and recording data. The possibilities are endless from effecting genetics to flying us around the world when you give computer vision.
And now we have gained our full vision.
TL;DR
- We don’t have perfect vision
- Computers can use machine learning and convolution neural networks to see images
- Computer vision is possible by taking pixel values and applying kernel filters
- CNNs have endless possibilities to affect areas like healthcare and genetics to autonomous vehicles