- mixed-signal binary convolutional neural netwok
- 3.8 uJ/classification (forward pass)
- 86% accuracy
- BinaryNet {+1 -1}
- multiplication to XNOR
- weight stationary
- data-parallel (all multiplies in parallel (?))
- input reuse
- wide vector sum as energy bottleneck
- 28 nm CMOS
- 328 kB on chip SRAM
- 237 frame/s
- 0.9 mW from 0.6 V meanint 3.8 uJ
- problem: DNN have to do millions to billions of MAC per inference

- weight stationary
- computing in memory (CIM) (?)
- CMOS-inspired, hardware specialization

- output image pixels are binarized
- always uses 2x2 filters, 256 channels and filters
- low fan-out de-multipliers
- BinaryNet required 1.67 MB
- 558 kB 6 CNN layers
- 1.13 MB 3 FC layers
- Instead only 261.5 kB
- 256 kB 8 CNN layers
- 5.5 kB 1 FC layer
- since we are dealing with +1, -1 and sign, batch norm is simplified



- pixel is quantized to 7 bits

- BinaryNet with XNOR operations
- network architecture designed to work well with CMOS hasrdware
- low weight memory
- memory cost is amortized - weight stationay data parallel, input reuse
- energy efficient SC neuron?