Draft:Embedded Machine Learning

Embedded Machine Learning is the field of study where Embedded System and Machine Learning interact. Normally, Machine Learning Models consumes resources in terms of processing power, memory and interference speed during the training and inference phase. On the opposite, Embedded Systems such as microcontrollers, ECUs, wearable devices and edge devices have limited computing resources (memory, processor speed etc). Enabling such large models to run (Interference) on these devices is the main goal of this field. Various techniques such as hardware acceleration & model optimisation are used to achieve this goal.

ML models can be trained on larger computing systems like cloud or server but it is challenging when it comes to deploy (download/flash) that models on embedded devices. Another challenge to run that models (inference phase) on the embedded systems to make predictions without significant loss in accuracy.

Hardware based methods
Hardware acceleration techniques leverages specialized hardware components, such as Digital Signal Processors (DSPs), Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and dedicated Neural Network Accelerators (NNAs), to accelerate the inference process and improve the efficiency of embedded machine learning algorithms.

Software based methods
Some Model Optimization (Compression) techniques are mentioned below which are used to compress/alter a ML model in such a way that the model take less space and make predictions faster without significant loss in accuracy.

Pruning
Removes less important connections and parameters from the model results in reduced size and complexities.

Quantization
Reduces the precision of parameters by using lower-bit representation (e.g., from 32 bits to 8 bits), leading to a smaller model size and faster inference.

It can be during the training phase (Quantization-aware training) or can be after training (post-training quantization).

Knowledge Distillation
Transfer knowledge from a large, pre-trained teacher network to a smaller student network, resulting in a smaller and efficient model with comparable accuracy.

Low-Rank Factorization
Decomposes weight matrices into lower-rank factors, reducing the number of parameters without significant loss of accuracy.

Network Architecture Search (NAS)
Optimizes network architectures for both accuracy and efficiency, potentially leading to a smaller models with high performance.

Parameter Sharing
To-do

Example Weight Sharing