Add Naive Training Moe Example Code on Single GPU or Multi GPUs #10

xhx1022 · 2025-08-03T09:59:33Z

Added a new training example function demonstrating how to train the MoE model on a single GPU or multi GPUS using dummy data. #9

skydoorkai

Is this example running with 4 GPUs?
Then the title Single-GPU Training is not correct.

skydoorkai · 2025-08-07T05:03:44Z

examples/moe_dualpipe/dualpipe/__init__.py

@@ -0,0 +1,17 @@
+__version__ = "1.0.0"


There are dualpipe codes, no need to be included.
In the README.md, explain how to clone the dualpipe codes, setup PYTHONPATH.

skydoorkai · 2025-08-07T05:04:27Z

examples/moe_dualpipe/examples/moe_train_basic.py

+
+    def apply_load_balancing_loss(self, router_probs, tokens_per_expert):
+        if self.moe_aux_loss_coeff > 0 and self.training:
+            # 计算每个专家的负载


Use English for comments.

skydoorkai · 2025-08-07T05:09:45Z

examples/moe_dualpipe/examples/moe_train_basic.py

+        self.moe_z_loss_coeff = z_loss_coeff
+        self.initializer_range = 0.02
+
+class MoEAuxLossAutoScaler(torch.autograd.Function):


If these MOE model definition codes are copied/modified from other repo's codes, add comments stating the original code source.

implement a training code to train it on single-GPU with dummy data

c14d3e7

xhx1022 requested review from skydoorkai, adamantboy, hxdtest and nash635 as code owners August 3, 2025 09:59

skydoorkai reviewed Aug 7, 2025

View reviewed changes

implement a training code to train it on multi-GPU with dummy data

a45f990

xhx1022 changed the title ~~Add Single-GPU Training Moe Example Code~~ Add Naive Training Moe Example Code on Single GPU or Multi GPUs Aug 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Naive Training Moe Example Code on Single GPU or Multi GPUs #10

Add Naive Training Moe Example Code on Single GPU or Multi GPUs #10

Uh oh!

xhx1022 commented Aug 3, 2025 •

edited

Loading

Uh oh!

skydoorkai left a comment

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

Uh oh!

Add Naive Training Moe Example Code on Single GPU or Multi GPUs #10

Are you sure you want to change the base?

Add Naive Training Moe Example Code on Single GPU or Multi GPUs #10

Uh oh!

Conversation

xhx1022 commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skydoorkai left a comment

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xhx1022 commented Aug 3, 2025 •

edited

Loading