A convolutional neural network (CNN) augments the [[feed forward neural net|multilayer perceptron]] by adding one or more [[convolution]] steps to the inputs. CNNs can be used to constrain the number of parameters in neural networks with large inputs (e.g., images) and encode the context of a pixel in its representation when the image is converted to a 1-D array (solving a problem known as **translational invariance**).
In image processing, convolution layers can be understood as identifying specific features (like eyebrows and mouths). For this reason, the convolution and [[pooling]] layer are together referred to as **feature extractor**.
In a convolutional neural network, multiple convolutions are used to create **feature layers** that are then fed into the neural network (however you could use convolution simply to create features for any other classifier like [[XGBoost]]).
Use **padding** to avoid reducing the dimension of the output matrix. Use **stride** to take larger steps in the moving window operation. Stride will reduce the output matrix by a factor of the stride length (a stride of 2 returns an output matrix half the size of the input).
The weights in the filter (or kernel) are learned through the fitting process.
CNN architecture can become quite complicated. A few examples of famous CNN architectures include:
- **VGGNet** (2014) used n-layers comprised of two convolution layers and one pooling layer.
- **GoogLeNet InceptionNet** (2014) used repeating inception layers combining 1x1, 3x3 and 5x5 convolution filters in parallel.
- **ResNet** (2015) used a skip connections to avoid unstable training due to the nature of many non-linear activation functions in series causing the gradient to vanish or explode.
## training tips for CNN
- Use learning rate of 0.01 to 0.001
- Use Adam or RMSProp optimization method
- Use ReLU or PReLU activation for hidden layers
- Use Sigmoid, Softmax, Tanh, or PReLU for output layers
- Use 3x3 filters
- Use n-layers of Conv-Conv-MaxPool for architecture
- Use L2 regularization
- Use batch normalization (and/or dropouts but batch normalization is usually sufficient)