When it comes to the deployment and inference of pretrained neural networks (NN’s) in Edge AI Devices, we have to reduce the NN’s memory footprint (size) to fit the edge device’s memory. Also we have to reduce the number of computations in NN to achieve higher throughput, while keeping the required accuracy. To that end we can use following NN compression methodologies.

  • Pruning redundant weights, filters, layers of the NN.
  • Quantizing weights of the NN. (float64 to float32, float16, int8)
  • Developing an efficient NN architecture.
  • Knowledge distillation.

If you are not familiar with above methodologies its better to read the…

Chinthaka Gamanayake

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store