Skip to content

LightLLM v1.1.0 Release!

Latest
Compare
Choose a tag to compare
@shihaobai shihaobai released this 03 Sep 07:54
· 19 commits to main since this release
0f5d0cc

🎉 Announcing LightLLM v1.1.0: More Efficient, More Powerful!

We are thrilled to introduce LightLLM v1.1.0, featuring major architectural and optimization upgrades for higher performance and broader applicability.

✨ Key Highlights

🚀 CPU-GPU Unified Folding Architecture

  • Drastically reduces system-level CPU overhead

⚡ Deep Model Optimizations

  • Enhanced support for DeepSeek and Qwen3-MoE
  • Integration of DeepEP / DeepGEMM and fused MoE Triton optimizations
  • New balanced DP request scheduler.
  • Added support for MTP

⚙️ Autotuner for Triton Kernels

  • Automatically tunes kernel operators used by the model at service startup

🏆 ACL Outstanding Award: Pre^3

🖼️ Improved Multimodal Inference

  • Further optimizations for faster and more reliable multimodal model inference

📖 Learn More

More details can be found in the LightLLM v1.1.0 blog post.