Related Works

Low Power Classification

Deep Learning is a powerful tool that is able to model complex relationships. One example is image classification, where the user

\subsection{Low Power Classification}
Deep Learning training and inference often incur significant computing and power expense, making them impractical for edge devices. 
MACs are an accepted computational-cost metric as they map to both the multiply-and-accumulate computation and its memory access patterns of filter weights, layer input maps, and partial sums for layer output maps. 
Prior work has decreased parameter size and multiply-and-accumulate (MAC) operations.

SqueezeNet \cite{iandola2016squeezenet} introduced Fire modules as a compression method in an effort to reduce the number of parameters while maintaining accuracy. 
% Reducing $3\times 3$ convolutions to $1\times1$ achieves $9\times$ fewer parameters for a given filter; decreasing the number of input channels to larger $3\times 3$ filters with squeeze layers further lowers the number of parameters.
MobileNetV1 \cite{howard2017mobilenets} replaced standard convolution with depth-wise separable convolutions where a depth-wise convolution performs spatial filtering and pointwise convolutions generate features.
Fast Downsampling \cite{qin2018fd} expanded on MobileNet for extremely computationally constrained tasks--32$\times$ downsampling in the first 12 layers drops the computational substantially with a 5\% accuracy loss.
Trained Ternary Quantization \cite{zhu2016trained} reduced weight precision to 2-bit ternary values with scaling factors with zero accuracy loss.
MobileNetV3 \cite{howard2019searching} used neural architecture search optimizing for efficiency to design their model. 
Other improvements include `hard'~activation functions (h-swish and h-sigmoid) \cite{ramachandran2017searching}, inverted residuals and linear bottlenecks \cite{sandler2018mobilenetv2}, and squeeze-and-excite layers\cite{hu2018squeeze} that extract spatial and channel-wise information.
% In a 45\si{\nano\meter} process, a 32-bit integer multiplication is 3.1\si{\pico\joule}, a 32-bit integer addition is 0.1\si{\pico\joule}, encapsulating most of the energy cost in a MAC \cite{horowitzenergy}. 
Benchmarking from a 45\si{\nano\meter} process \cite{horowitzenergy}, shrinking process nodes and decreased bit precision enable a MAC cost approaching 1\si{\pico\joule}. 
Targeting 1\si{\micro\joule} per forward-pass, we combine these advancements into a new network with $<$1 million MACs. 

My Previous

\subsection{Low Power Classification}
Deep Learning training and inference often computing and power expense, making it infeasible to run on embedded devices. Prior work has dealt with dropping parameter size and multiply-and-accumulate (MAC) operations. MACs are a more useful metric as they map to both the multiply-and-accumulate computation and its memory access patterns of filter weights, layer input maps, and partial sums for layer output maps.

SqueezeNet \cite{iandola2016squeezenet} focusing on decreasing the number of parameters while maintaining accuracy. This is done with Fire Modules which reduces some $3\times3$ convolutions with $1\times1$ convolutions, which are 9x less parameters.
MobileNet \cite{howard2017mobilenets} use depth-wise separable convolutions which replace standard convolutions into two efficient operations of depth-wise convolutions and pointwise convolutions.
Fast Downsampling \cite{qin2018fd} expands on Mobilenet for extremely computational constrained tasks. They perform 32x downsampling in the first 12 layers of MobileNet and use increased number of channels to decrease computational cost to 12 MFLOPs at 5\% accuracy loss.
Trained Ternary Quantization \cite{zhu2016trained} reduces precision of weights to only 2-bit ternary values with scaling factors with zero accuracy loss.
MobileNet V3 \cite{howard2019searching} expands on Mobilenet and use neural architecture search optimizing for efficiency to design their model. They include other works including activation functions h-swish and h-sigmoid \cite{ramachandran2017searching}, inverted residuals and linear bottlenecks \cite{sandler2018mobilenetv2} that improve on depthwise seperatable convolutions, and squeeze-and-excite \cite{hu2018squeeze} layers that extract spatial and channel-wise information.