🎉 Announcing LightLLM v1.1.0: More Efficient, More Powerful!
We are thrilled to introduce LightLLM v1.1.0, featuring major architectural and optimization upgrades for higher performance and broader applicability.
✨ Key Highlights
🚀 CPU-GPU Unified Folding Architecture
- Drastically reduces system-level CPU overhead
⚡ Deep Model Optimizations
- Enhanced support for DeepSeek and Qwen3-MoE
- Integration of DeepEP / DeepGEMM and fused MoE Triton optimizations
- New balanced DP request scheduler.
- Added support for MTP
⚙️ Autotuner for Triton Kernels
- Automatically tunes kernel operators used by the model at service startup
🏆 ACL Outstanding Award: Pre^3
🖼️ Improved Multimodal Inference
- Further optimizations for faster and more reliable multimodal model inference
📖 Learn More
More details can be found in the LightLLM v1.1.0 blog post.