Designing deep neural networks these days is more art than science. In the deep learning space, any given problem can be addressed with a fairly large number of neural network architectures. In that sense, designing a deep neural network from the ground up for a given problem can result incredibly expensive in terms of time and computational resources. Additionally, given the lack of guidance in the space, we often end up producing neural network architectures that are suboptimal for the task at hand. Recently, artificial intelligence(AI) researchers from Google published a paper proposing a method called MorphNet to optimize the design of deep neural networks.
Automated neural network design is one of the most active areas of research in the deep learning space. The most traditional approach to neural network architecture design involves sparse regularizers using methods such as L1. While this technique has proven effective on reducing the number of connections in a neural network, quite often ends up producing suboptimal architectures. Another approach involves using search techniques to find an optimal neural network architecture for a given problem. That method has been able to generate highly optimized neural network architectures but it requires an exorbitant number of trial and error attempts which often results computationally prohibited. As a result, neural network architecture search has only proven effective in very specialized scenarios. Factoring the limitations of the previous methods, we can arrive to three key characteristics of effective automated neural network design techniques:
a) Scalability: The automated design approach should be scalable to large datasets and models.
b) Multi-Factor Optimization: An automated method should be able to optimized the structure of a deep neural network targeting specific resources.
c) Optimal: An automated neural network design should produce an architecture that improves performance while reducing the usage of the target resource.
Google’s MorphNet approaches the problem of automated neural network architecture design from a slightly different angle. Instead of trying to try numerous architectures across a large design space, MorphNet start with an existing architecture for a similar problem and, in one shot, optimize it for the task at hand.
MorphNet optimizes a deep neural network by interactively shrinking and expanding its structure. In the shrinking phase, MorphNet identifies inefficient neurons and prunes them from the network by applying a sparsifying regularizer such that the total loss function of the network includes a cost for each neuron. Just doing this typically results on a neural network that consumes less of the targeted resource, but typically achieves a lower performance. However, MorphNet applies a specific shrinking model that not only highlights which layers of a neural network are over-parameterized, but also which layers are bottlenecked. Instead of applying a uniform cost per neuron, MorphNet calculates a neuron cost with respect to the targeted resource. As training progresses, the optimizer is aware of the resource cost when calculating gradients, and thus learns which neurons are resource-efficient and which can be removed.
The shrinking phase of MorphNet is useful to produce a neural network that optimizes the cost for a specific resource. However, that optimization could come at the cost of accuracy. That’s precisesly why MorphNet uses an expanding phase based on a width multiplier to expand the sizes of all layers. For example, an expansion of 50% will cause inefficient layer that started with 100 neurons and shrank to 10 would only expand back to 15, while an important layer that only shrank to 80 neurons might expand to 120 and have more resources with which to work. The net effect is re-allocation of computational resources from less efficient parts of the network to parts of the network where they might be more useful.
The combination of the shrinking and expanding phases produces a neural network that is more accurate than the original while is still somewhat optimized for a specific resource.
In this initial iteration, there are several areas in which MorphNet can deliver immediate value of neural network architectures.
· Targeted Regularization: MorphNet optimizes the structure of a deep neural network focusing on the reduction of a specific resource. Conceptually, this model provides a more targeted approach than traditional regularization techniques. The following figure represents a traditional RestNet-101 architecture optimized by MorphNet using two criterials: FLOPs and model size. The structures generated by MorphNet when targeting FLOPs (center, with 40% fewer FLOPs) or model size (right, with 43% fewer weights) are dramatically different. When optimizing for computation cost, higher-resolution neurons in the lower layers of the network tend to be pruned more than lower-resolution neurons in the upper layers. When targeting smaller model size, the pruning tradeoff is the opposite.
· Topology Morphing: Some optimizations created by MorphNet might produce completely new topologies. For instance, when a layer has 0 neurons, MorphNet might effectively change the topology of the network by cutting the affected branch from the network. Let’s look at the following figure which again shows changes on a RestNet architecture. In that example, MorphNet might keep the skip-connection but remove the residual block as shown below (left). For Inception-style architectures, MorphNet might remove entire parallel towers as shown on the right.
· Scalability: One of the greatest advantages of MorphNet, is that it can learn a new structure in a single training run which minimizes the computational resources needed for training and can scale to very complex architectures.
· Portability: The networks produced by MorphNet are technically portable and can be retrained from scratch as the weights are not tied to the learning procedure.
Google applied MorphNet to a variety of scenarios including Inception V2 trained on ImageNet using FLOP optimizations. Contrasting with traditional regularization approaches that focus on scaling down the number of outputs, the MorphNet approach targets FLOPs directly and produces a better trade-off curve when shrinking the model (blue). In this case, FLOP cost is reduced 11% to 15% with the same accuracy as compared to the baseline.
Google released an open source version of MorphNet on GitHub . In a nutshell, using MorphNet consists on the following steps:
1) Choose a regularizer from morphnet.network_regularizers and initialized it using a specific optimization metric. The current implementation of MorphNet includes several regularization algorithms.
2) Train the target model.
3) Save the proposed model structure with the StructureExporter.
4) Retrain the model from scratch without the MorphNet regularizer.
The following code illustrates those steps:
from morphnet.networkregularizers import flopregularizer
from morphnet.tools import structure_exporter
logits = build_model()
networkregularizer = flopregularizer.GammaFlopsRegularizer(
regularizationstrength = 1e-10
regularizerloss = (networkregularizer.getregularizationterm() * regularization_strength)
modelloss = tf.nn.sparsesoftmaxcrossentropywithlogits(labels, logits)
optimizer = tf.train.MomentumOptimizer(learning_rate=0.01, momentum=0.9)
trainop = optimizer.minimize(modelloss + regularizer_loss)
Automated neural network architecture design is a key area to make deep learning more mainstream. The best neural network architectures are those produced using a combination of human programmers and machine learning algorithms. MorphNet brings a very innovative angle to this new hot space of the deep learning ecosystem.
This content was originally published here.