DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
DeepSpeed-MoE 란, an end-to-end MoE training and inference solution as part of the DeepSpeed library 이다.
B) Features
- Provide novel MoE architecture designs and model compression techniques.
- Reduce MoE model size by up to 3.7x, and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions.