convolutional neural network

A convolutional neural network (CNN) augments the [[feed forward neural net|multilayer perceptron]] by adding one or more [[convolution]] steps to the inputs. CNNs can be used to constrain the number of parameters in neural networks with large inputs (e.g., images) and encode the context of a pixel in its representation when the image is converted to a 1-D array (solving a problem known as **translational invariance**). In image processing, convolution layers can be understood as identifying specific features (like eyebrows and mouths). For this reason, the convolution and [[pooling]] layer are together referred to as **feature extractor**. In a convolutional neural network, multiple convolutions are used to create **feature layers** that are then fed into the neural network (however you could use convolution simply to create features for any other classifier like [[XGBoost]]). Use **padding** to avoid reducing the dimension of the output matrix. Use **stride** to take larger steps in the moving window operation. Stride will reduce the output matrix by a factor of the stride length (a stride of 2 returns an output matrix half the size of the input). The weights in the filter (or kernel) are learned through the fitting process. CNN architecture can become quite complicated. A few examples of famous CNN architectures include: - **VGGNet** (2014) used n-layers comprised of two convolution layers and one pooling layer. - **GoogLeNet InceptionNet** (2014) used repeating inception layers combining 1x1, 3x3 and 5x5 convolution filters in parallel. - **ResNet** (2015) used a skip connections to avoid unstable training due to the nature of many non-linear activation functions in series causing the gradient to vanish or explode. ## training tips for CNN - Use learning rate of 0.01 to 0.001 - Use Adam or RMSProp optimization method - Use ReLU or PReLU activation for hidden layers - Use Sigmoid, Softmax, Tanh, or PReLU for output layers - Use 3x3 filters - Use n-layers of Conv-Conv-MaxPool for architecture - Use L2 regularization - Use batch normalization (and/or dropouts but batch normalization is usually sufficient)