MSc thesis project proposal

CNN accelerator: Multipliers vs. shifters!

CNNs (Convolutional Neural Networks) are the brain-inspired algorithms that have revolutionized the AI (Artificial Intelligence). Thanks to them, detection and classification of objects on images can be done at real-time with superhuman accuracy. However, to process them are needed massive parallel computations, which takes extreme long times in CPUs, therefore they are executed in power-intensive GPUs. In order to use these algorithms in edge devices, a lot of efforts are been conducted to optimize hardware accelerators.

There are many different quantization techniques for reducing the weights (the key learnable parameters that allow the CNNs to learn and function in a specific way). Some of them produce low accuracy loss and others are efficient for hardware implementation. Dynamic-fixed point, stochastic and lookup table are some of these CNN quantization methods. Also, weight sharing can reduce bit-length of the CNN weights without causing any accuracy loss.

Another proposed method is to encode the weights in logarithmic quantization: 1, 2, 4, 8, 16, etc… I.e. to use the bits to encode 2^x. E.g. “101” (-3 in two’s complement) in logarithmic quantization represents 2^(-3)=0.125. By encoding the weights in this way not only is achieved a wider range for the same amount of bits (although with “bigger” steps), but multiplications on CNNs are replaced by bit-shifts!

However, which device is smaller? Multipliers or shifters? Which one is faster? And which one has lower power consumption?

Current FPGAs came with high-optimized multipliers, whereas shifters are usually implemented with flip-flops and muxes. But, what about an ASIC in a particular technology? Will an optimized shifter outperform a multiplier?

Assignment

The first aim of this Msc thesis proposal is to analyze different quantization techniques for different CNN models for accuracy and hardware cost. Then, the second aim is to implement two CNN accelerators, one using multipliers and another using shifters, profile and compare both.

Requirements

For a first (and high-level) analysis and profile of the CNNs the student should use Matlab or Python. For the implementation and more accurate analysis, the student should use VHDL or Verilog. Knowledge of SystemC is also recommended. Previous knowledge and interest in deep learning and CNNs are also necessary!

Contact

dr. David Aledo

Signal Processing Systems Group

Department of Microelectronics

Last modified: 2023-11-04