-
Notifications
You must be signed in to change notification settings - Fork 309
feat: implement gemma3n text model in MLXLLM #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF
Nice! Did you base this on #340, or did you start from scratch based on the Python implementation? |
Hey, good question. This implementation is like 3rd attempt and it is made from scratch based on the python source from mlx-lm. #340 was a great inspiration, since I am new to this, but sometimes it was misleading. Also, in my initial attempt I was using mlx-vlm language but it also wasn't a good reference. It all worked out once mlx-lm reference was ready. The key to a successful transpilation is to prompt it piece by piece and verify, also feeding this at the end a re-verifying the whole thing https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/converting-python |
It looks like it needs swift-format run:
|
@davidkoski please take another look, I've run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, thank you!
Xcode 16 fails with:
This is caused by a trailing comma in a call, e.g. self._routerNorm.wrappedValue = RMSNorm(
dimensions: config.hiddenSize,
eps: config.rmsNormEps, // <--- here
) |
Hey, I have one question after trying mlx-community/gemma-3n-E2B-it-lm-4bit with your implementation (huge thanks for it). How do you resolve missing chat template in |
@davidkoski very interesting, I was confused why this happens until I found the proposal SE-0439 that enables trailing commas in Swift, according to release page of Apple Swift version 6.1.2, it's available in Xcode 16.3+ Anyways, for the sake of compatibility I've removed commas in the latest commit. Cannot test with Xcode 16 but it must be fine now! |
Yeah, I am on 16.3 myself -- the 16.0 CI builder has been very useful :-) |
@tseylerd I don't have use cases where missing chat template is a problem. If you have an example how it must look like, you can attach here and I'll update the repos on HF with new |
Thank you for the contribution! |
Implementation of Gemma 3n model for MLXLLM, text only. Based on the reference implementation in mlx-lm:
ml-explore/mlx-lm#258
This code can actually help building the VLM version there #340
cc @DePasqualeOrg
Models
The original MLX weights from
mlx-vlm
are not supported, only weights converted bymlx-lm
are supported.I've made a new collection with Text Only MLX models, i.e.
bf16
and4bit
quantized using this new support.https://huggingface.co/collections/mlx-community/gemma-3n-text-only-lm-6861cf66ddc9a13102996308
Naive benchmarks
Apple M4 Max
mlx-community/gemma-3n-E4B-it-lm-bf16
mlx-community/gemma-3n-E2B-it-lm-bf16
mlx-community/gemma-3n-E4B-it-lm-4bit
mlx-community/gemma-3n-E2B-it-lm-4bit
iPhone 16 Pro
mlx-community/gemma-3n-E4B-it-lm-4bit
mlx-community/gemma-3n-E2B-it-lm-4bit
Notes
gelu_topk
,logit_softcap
) to improve performaneRMSNoScale
can be improved whenMLXFast.rmsNorm
is fixed (allows nil weights)Misc
Demos