Model Compression
-
GitHub - open-mmlab/mmrazor: OpenMMLab Model Compression Toolbox and Benchmark.
-
GitHub - microsoft/VPTQ: VPTQ, A Flexible and Extreme low-bit quantization algorithm
Quantization
https://github.com/666DZY666/micronet
Lower Precision
GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch
Knowledge Distillation
For Knowledge Distillation see this note