Skip to content

Conversation

xlab
Copy link
Contributor

@xlab xlab commented Jul 2, 2025

Implementation of Gemma 3n model for MLXLLM, text only. Based on the reference implementation in mlx-lm:
ml-explore/mlx-lm#258

This code can actually help building the VLM version there #340
cc @DePasqualeOrg

Models

The original MLX weights from mlx-vlm are not supported, only weights converted by mlx-lm are supported.

I've made a new collection with Text Only MLX models, i.e. bf16 and 4bit quantized using this new support.
https://huggingface.co/collections/mlx-community/gemma-3n-text-only-lm-6861cf66ddc9a13102996308

Naive benchmarks

Apple M4 Max

Model Peak Memory Generation Speed Generation Time
mlx-community/gemma-3n-E4B-it-lm-bf16 13097M 39.983666 tokens/s 7.503064s
mlx-community/gemma-3n-E2B-it-lm-bf16 8535M 63.440184 tokens/s 4.728864s
mlx-community/gemma-3n-E4B-it-lm-4bit 3684M 81.048619 tokens/s 3.701482s
mlx-community/gemma-3n-E2B-it-lm-4bit 2391M 113.438366 tokens/s 2.644608s

iPhone 16 Pro

Model Generation Speed
mlx-community/gemma-3n-E4B-it-lm-4bit 10-15 tokens/s
mlx-community/gemma-3n-E2B-it-lm-4bit 25-30 tokens/s

Notes

  • Some operations can be compiled (e.g. gelu_topk, logit_softcap) to improve performane
  • RMSNoScale can be improved when MLXFast.rmsNorm is fixed (allows nil weights)

Misc

  • added to LLMModelFactory
  • added to MLXService
  • added to MLXChatExample
  • 4 model references from HF

Demos

ios-15toks
macos-60toks

xlab and others added 2 commits July 2, 2025 04:01
* added to LLMModelFactory
* added to MLXService
* added to MLXChatExample
* 4 model references from HF
@DePasqualeOrg
Copy link
Contributor

Nice! Did you base this on #340, or did you start from scratch based on the Python implementation?

@xlab
Copy link
Contributor Author

xlab commented Jul 2, 2025

Hey, good question. This implementation is like 3rd attempt and it is made from scratch based on the python source from mlx-lm.

#340 was a great inspiration, since I am new to this, but sometimes it was misleading. Also, in my initial attempt I was using mlx-vlm language but it also wasn't a good reference. It all worked out once mlx-lm reference was ready.

The key to a successful transpilation is to prompt it piece by piece and verify, also feeding this at the end a re-verifying the whole thing https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/converting-python

@davidkoski
Copy link
Collaborator

It looks like it needs swift-format run:

swift-format.............................................................Failed
- hook id: swift-format
- files were modified by this hook

@xlab
Copy link
Contributor Author

xlab commented Jul 20, 2025

@davidkoski please take another look, I've run swift-format and pre-commit checks are good.
If everything else is ok, let's merge this PR otherwise it might get overrun lol.

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thank you!

@davidkoski
Copy link
Collaborator

Xcode 16 fails with:

/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:387:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:508:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:513:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:517:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:521:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:539:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:717:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:727:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:739:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:751:9: error: unexpected ',' separator
        )
        ^

This is caused by a trailing comma in a call, e.g.

        self._routerNorm.wrappedValue = RMSNorm(
            dimensions: config.hiddenSize,
            eps: config.rmsNormEps, // <--- here
        )

@tseylerd
Copy link

Hey, I have one question after trying mlx-community/gemma-3n-E2B-it-lm-4bit with your implementation (huge thanks for it). How do you resolve missing chat template in tokenizer_config.json?

@xlab
Copy link
Contributor Author

xlab commented Jul 22, 2025

@davidkoski very interesting, I was confused why this happens until I found the proposal SE-0439 that enables trailing commas in Swift, according to release page of Apple Swift version 6.1.2, it's available in Xcode 16.3+

Anyways, for the sake of compatibility I've removed commas in the latest commit. Cannot test with Xcode 16 but it must be fine now!

@davidkoski
Copy link
Collaborator

Yeah, I am on 16.3 myself -- the 16.0 CI builder has been very useful :-)

@xlab
Copy link
Contributor Author

xlab commented Jul 22, 2025

@tseylerd I don't have use cases where missing chat template is a problem. If you have an example how it must look like, you can attach here and I'll update the repos on HF with new tokenizer_config.json

@davidkoski davidkoski merged commit 505c86f into ml-explore:main Jul 22, 2025
3 checks passed
@davidkoski
Copy link
Collaborator

Thank you for the contribution!

@xlab xlab deleted the gemma3n-lm branch July 22, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants