This repository contains the official implementation of MDP method introduced in our CVPR 2025 paper:
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose M. Alvarez
Please check the LICENSE file. HALP may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact [email protected].
- [2025/09] Release license obtained. ResNet50 and ablation study code are now available; remaining code will be cleaned up and released soon.
- [2025/06] I presented MDP in a CVPR 2025 tutorial on Full-Stack, GPU-based Acceleration of Deep Learning and Foundation Models. You can watch the tutorial video here!
Current structural pruning methods face two significant limitations:
- They often limit pruning to finer-grained levels like channels, making aggressive parameter reduction challenging
- They focus heavily on parameter and FLOP reduction, with existing latency-aware methods frequently relying on simplistic, suboptimal linear models that fail to generalize well to transformers
In this paper, we address both limitations by introducing Multi-Dimensional Pruning (MDP), a novel paradigm that:
- Jointly optimizes across various pruning granularities (channels, query, key, heads, embeddings, and blocks)
- Employs advanced latency modeling to accurately capture latency variations
- Reformulates pruning as a Mixed-Integer Nonlinear Program (MINLP)
- Supports both CNNs and transformers
Our extensive experiments demonstrate MDP's superior performance:
- ResNet50: 28% speed increase with +1.4 Top-1 accuracy improvement over prior art
- DEIT-Base: 37% additional acceleration with +0.7 Top-1 accuracy improvement over prior art
- Higher speed (×1.18) and mAP (0.451 vs. 0.449) compared to dense baseline
Please check README within the folder for the task you want to run!
Please check README within the folder for the task you want to run!
Some of the infrastructure, data loading, and foundational code are adapted from HALP and NVIT works. We sincerely thank the authors of these works for their contributions.
If you find this repository useful for your research, please cite our paper:
@misc{sun2025mdpmultidimensionalvisionmodel,
title={MDP: Multidimensional Vision Model Pruning with Latency Constraint},
author={Xinglong Sun and Barath Lakshmanan and Maying Shen and Shiyi Lan and Jingde Chen and Jose M. Alvarez},
year={2025},
eprint={2504.02168},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.02168}
}