A available within the keras framework L1, L2

A commonly employed pooling approach is max pooling, where the pooling layer extracts sub-regions of the feature map (e.g., 2×2-pixel grids), keeps their maximum value, and discards allother values.Image inputs that are very large and have diffuse (in pixel terms) sub region edges may warrantlarger pooling sizes, particularly in the lower layers of the network. However, this approachreduces the dimension of the signal transmitted to subsequent layers and as a consequence canresult in excessive information loss. On this basis, the max pooling shape was not exploredwithin the hyperparameter optimisation phase and remained static at 2×2. Given this, andits presence within the Brownlee architecture, a 2×2 grid was deemed an appropriate poolingshape.Number of filtersThe filter selection range was distributed around the number of filters observed within theBrownlee model, with two additional values included above and below to provide what wasdeemed a reasonable, but nevertheless constrained, search space (i.e. 16,24, 32, 40, 48).OptimisationThree different optimisers were examined: Adam, Stochastic Gradient Descent (SGD) and RM-SProp. Initial experimentation, supported by research findings (Kingma & Ba 2014), demon-strated the Adam optimiser’s consistent outperformance both SGD and RmsProp. On thisbasis the choice of Optimiser was not included within the optimal hyperparameter search andthereby reducing the search space size.Activation FunctionsActivation functions with linear components are now common place within neural networks,especially deep networks, owing to their superior performance. While literature (Clevert et al.2015) did suggest that ELU activation functions outperformed the related ReLU function giventhe ReLU functions prevalence it was also considered within the optimisation.Regularisation L1 L2The keras framework offers three applications of traditional regularisation: kernel, bias and ac-tivity regularization. Initial experimentation suggested that limited benefits could be realisedby regularizing the kernel and activity functions, and hence they were excluded.Three further types of regularization are available within the keras framework L1, L2 anda combination of L1 and L2. Neurons with L1 regularization typically settle on a sparse sub-set of the most important inputs and become effectively invariant to the other, noisy or lessinformative, inputs. In contrast, the final weight vectors observed with L2 regularization aretypically small and diffuse, owing to the quadratic penalty arising from large weights. Due tothe multiplicative interactions between inputs and weights, this has the appealing property ofincentivising the neural network to use all the inputs rather than over-rely on a select few. Fi-nally a combination of L1 and L2 can also be performed, enjoying the benefits of both methods.