# mcunet **Repository Path**: cuishaowei/mcunet ## Basic Information - **Project Name**: mcunet - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-03 - **Last Updated**: 2025-07-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MCUNet: Tiny Deep Learning on IoT Devices This is the official implementation of the MCUNet series. ### [TinyML Project Website](https://hanlab.mit.edu/projects/tinyml) | [MCUNetV1](https://hanlab.mit.edu/projects/mcunet) | [MCUNetV2](https://hanlab.mit.edu/projects/mcunetv2) | [MCUNetV3](https://hanlab.mit.edu/projects/mcunetv3) ![demo](assets/figures/mcunet_demo.gif) ## News **If you are interested in getting updates, please sign up [here](https://forms.gle/UW1uUmnfk1k6UJPPA) to get notified!** - **(2024/03)** We release a [new demo video](https://www.youtube.com/watch?v=0pUFZYdoMY8) of [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472). - **(2023/10)** [Tiny Machine Learning: Progress and Futures \[Feature\]](https://hanlab.mit.edu/projects/tinyml-magazine) appears at IEEE CAS Magazine. - **(2022/12)** We simplified the `net_id` of models (new version: `mcunet-in0`, `mcunet-vww1`, etc.) for an upcoming review paper (stay tuned!). - **(2022/10)** Our new work [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472) is highlighted on the [MIT homepage](http://web.mit.edu/spotlight/learning-edge/)! - **(2022/09)** Our new work [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472) is accepted to NeurIPS 2022! It enables tiny on-device training for IoT devices. - **(2022/08)** We release the source code of **TinyEngine** in [this repo](https://github.com/mit-han-lab/tinyengine). Please take a look! - **(2022/08)** Our new course on **TinyML and Efficient Deep Learning** will be released soon in September 2022: [efficientml.ai](https://efficientml.ai/). - **(2022/07)** We also include the person detection model used in the video demo above. We will also include the deployment code in TinyEngine release. - **(2022/06)** We refactor the MCUNet repo as a standalone repo (previous repo: https://github.com/mit-han-lab/tinyml) - **(2021/10)** **MCUNetV2** is accepted to NeurIPS 2021: https://arxiv.org/abs/2110.15352 ! - **(2020/10)** **MCUNet** is accepted to NeurIPS 2020 as **spotlight**: https://arxiv.org/abs/2007.10319 ! - Our projects are covered by: [MIT News](https://news.mit.edu/2020/iot-deep-learning-1113), [MIT News (v2)](https://news.mit.edu/2021/tiny-machine-learning-design-alleviates-bottleneck-memory-usage-iot-devices-1208), [WIRED](https://www.wired.com/story/ai-algorithms-slimming-fit-fridge/), [Morning Brew](https://www.morningbrew.com/emerging-tech/stories/2020/12/07/researchers-figured-fit-ai-ever-onto-internet-things-microchips), [Stacey on IoT](https://staceyoniot.com/researchers-take-a-3-pronged-approach-to-edge-ai/), [Analytics Insight](https://www.analyticsinsight.net/amalgamating-ml-and-iot-in-smart-home-devices/), [Techable](https://techable.jp/archives/142462), etc. ## Overview Microcontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications. ![teaser](assets/figures/applications.png) But the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult. ![teaser](assets/figures/memory_size.png) MCUNet is a **system-algorithm co-design** framework for tiny deep learning on microcontrollers. It consists of **TinyNAS** and **TinyEngine**. They are co-designed to fit the tight memory budgets. With system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget. ![teaser](assets/figures/overview.png) Our **TinyEngine** inference engine could be a useful infrastructure for MCU-based AI applications. It significantly **improves the inference speed and reduces the memory usage** compared to existing libraries like [TF-Lite Micro](https://www.tensorflow.org/lite/microcontrollers), [CMSIS-NN](https://arxiv.org/abs/1801.06601), [MicroTVM](https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny), etc. It improves the inference speed by **1.5-3x**, and reduces the peak memory by **2.7-4.8x**. ![teaser](assets/figures/latency_mem.png) ## Model Zoo ### Usage You can build the pre-trained PyTorch `fp32` model or the `int8` quantized model in TF-Lite format. ```python from mcunet.model_zoo import net_id_list, build_model, download_tflite print(net_id_list) # the list of models in the model zoo # pytorch fp32 model model, image_size, description = build_model(net_id="mcunet-in3", pretrained=True) # you can replace net_id with any other option from net_id_list # download tflite file to tflite_path tflite_path = download_tflite(net_id="mcunet-in3") ``` ### Evaluate To evaluate the accuracy of PyTorch `fp32` models, run: ```bash python eval_torch.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val ``` To evaluate the accuracy of TF-Lite `int8` models, run: ```bash python eval_tflite.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val ``` ### Model List - Note that all the **latency**, **SRAM**, and **Flash** usage are profiled with **TinyEngine** on STM32F746. - Here we only provide the `int8` quantized modes. `int4` quantized models (as shown in the paper) can further push the accuracy-memory trade-off, but lacking a general format support. - For accuracy (top1, top-5), we report the accuracy of `fp32`/`int8` models respectively The **ImageNet** model list: | net_id | MACs | #Params | SRAM | Flash | Res. | Top-1
(fp32/int8) | Top-5
(fp32/int8) | | ------------------- | ------ | ------- | ----- | ------ | ---- | ---------------------- | ---------------------- | | *# baseline models* | | | | | | | | | mbv2-w0.35 | 23.5M | 0.75M | 308kB | 862kB | 144 | 49.7%/49.0% | 74.6%/73.8% | | proxyless-w0.3 | 38.3M | 0.75M | 292kB | 892kB | 176 | 57.0%/56.2% | 80.2%/79.7% | | *# mcunet models* | | | | | | | | | mcunet-in0 | 6.4M | 0.75M | 266kB | 889kB | 48 | 41.5%/40.4% | 66.3%/65.2% | | mcunet-in1 | 12.8M | 0.64M | 307kB | 992kB | 96 | 51.5%/49.9% | 75.5%/74.1% | | mcunet-in2 | 67.3M | 0.73M | 242kB | 878kB | 160 | 60.9%/60.3% | 83.3%/82.6% | | mcunet-in3 | 81.8M | 0.74M | 293kB | 897kB | 176 | 62.2%/61.8% | 84.5%/84.2% | | mcunet-in4 | 125.9M | 1.73M | 456kB | 1876kB | 160 | 68.4%/68.0% | 88.4%/88.1% | The **VWW** model list: *Note that the VWW dataset might be hard to prepare. You can download our pre-built `minival` set from [here](https://www.dropbox.com/s/bc7qi89ezra9711/vww-minival.tar?dl=0), around 380MB.* | net_id | MACs | #Params | SRAM | Flash | Resolution | Top-1
(fp32/int8) | | ----------- | ----- | ------- | ----- | ----- | ---------- | ---------------------- | | mcunet-vww0 | 6.0M | 0.37M | 146kB | 617kB | 64 | 87.4%/87.3% | | mcunet-vww1 | 11.6M | 0.43M | 162kB | 689kB | 80 | 88.9%/88.9% | | mcunet-vww2 | 55.8M | 0.64M | 311kB | 897kB | 144 | 91.7%/91.8% | For TF-Lite `int8` models, we do not use quantization-aware training (QAT), so some results is slightly lower than paper numbers. ### Detection Model We also share the person detection model used in the [demo](https://www.youtube.com/watch?v=F4XKn0iDfxg). To visualize the model's prediction on a sample image, please run the following command: ```bash python eval_det.py ``` It will visualize the prediction here: `assets/sample_images/person_det_vis.jpg`. The model takes in a small input resolution of 128x160 to reduce memory usage. It does not achieve state-of-the-art performance due to the limited image and model size but should provide decent performance for tinyML applications (please check the demo for a video recording). We will also release the deployment code in the upcoming TinyEngine release. ## Requirement - Python 3.6+ - PyTorch 1.4.0+ - Tensorflow 1.15 (if you want to test TF-Lite models; CPU support only) ## Acknowledgement We thank MIT-IBM Watson AI Lab, Intel, Amazon, SONY, Qualcomm, NSF for supporting this research. ## Citation If you find the project helpful, please consider citing our paper: ``` @article{lin2020mcunet, title={Mcunet: Tiny deep learning on iot devices}, author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song}, journal={Advances in Neural Information Processing Systems}, volume={33}, year={2020} } @inproceedings{ lin2021mcunetv2, title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning}, author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song}, booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)}, year={2021} } @article{ lin2022ondevice, title = {On-Device Training Under 256KB Memory}, author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, journal = {arXiv:2206.15472 [cs]}, url = {https://arxiv.org/abs/2206.15472}, year = {2022} } ``` ## Related Projects [On-Device Training Under 256KB Memory](https://hanlab.mit.edu/projects/mcunetv3) (NeurIPS'22) [TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning](https://arxiv.org/abs/2007.11622) (NeurIPS'20) [Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791) (ICLR'20) [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR'19) [AutoML for Architecting Efficient and Specialized Neural Networks](https://ieeexplore.ieee.org/abstract/document/8897011) (IEEE Micro) [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV'18) [HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf) (CVPR'19, oral)