feat: implement gemma3n text model in MLXLLM #346

xlab · 2025-07-02T08:51:58Z

Implementation of Gemma 3n model for MLXLLM, text only. Based on the reference implementation in mlx-lm:
ml-explore/mlx-lm#258

This code can actually help building the VLM version there #340
cc @DePasqualeOrg

Models

The original MLX weights from mlx-vlm are not supported, only weights converted by mlx-lm are supported.

I've made a new collection with Text Only MLX models, i.e. bf16 and 4bit quantized using this new support.
https://huggingface.co/collections/mlx-community/gemma-3n-text-only-lm-6861cf66ddc9a13102996308

Naive benchmarks

Apple M4 Max

Model	Peak Memory	Generation Speed	Generation Time
`mlx-community/gemma-3n-E4B-it-lm-bf16`	13097M	39.983666 tokens/s	7.503064s
`mlx-community/gemma-3n-E2B-it-lm-bf16`	8535M	63.440184 tokens/s	4.728864s
`mlx-community/gemma-3n-E4B-it-lm-4bit`	3684M	81.048619 tokens/s	3.701482s
`mlx-community/gemma-3n-E2B-it-lm-4bit`	2391M	113.438366 tokens/s	2.644608s

iPhone 16 Pro

Model	Generation Speed
`mlx-community/gemma-3n-E4B-it-lm-4bit`	10-15 tokens/s
`mlx-community/gemma-3n-E2B-it-lm-4bit`	25-30 tokens/s

Notes

Some operations can be compiled (e.g. gelu_topk, logit_softcap) to improve performane
RMSNoScale can be improved when MLXFast.rmsNorm is fixed (allows nil weights)

Misc

added to LLMModelFactory
added to MLXService
added to MLXChatExample
4 model references from HF

Demos

* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF

DePasqualeOrg · 2025-07-02T09:04:38Z

Nice! Did you base this on #340, or did you start from scratch based on the Python implementation?

xlab · 2025-07-02T15:10:25Z

Hey, good question. This implementation is like 3rd attempt and it is made from scratch based on the python source from mlx-lm.

#340 was a great inspiration, since I am new to this, but sometimes it was misleading. Also, in my initial attempt I was using mlx-vlm language but it also wasn't a good reference. It all worked out once mlx-lm reference was ready.

The key to a successful transpilation is to prompt it piece by piece and verify, also feeding this at the end a re-verifying the whole thing https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/converting-python

davidkoski · 2025-07-16T22:50:08Z

It looks like it needs swift-format run:

swift-format.............................................................Failed
- hook id: swift-format
- files were modified by this hook

xlab · 2025-07-20T06:34:01Z

@davidkoski please take another look, I've run swift-format and pre-commit checks are good.
If everything else is ok, let's merge this PR otherwise it might get overrun lol.

davidkoski

Changes look good, thank you!

davidkoski · 2025-07-21T16:51:05Z

Xcode 16 fails with:

/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:387:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:508:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:513:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:517:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:521:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:539:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:717:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:727:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:739:9: error: unexpected ',' separator
        )
        ^
/Users/distiller/project/Libraries/MLXLLM/Models/Gemma3nText.swift:751:9: error: unexpected ',' separator
        )
        ^

This is caused by a trailing comma in a call, e.g.

        self._routerNorm.wrappedValue = RMSNorm(
            dimensions: config.hiddenSize,
            eps: config.rmsNormEps, // <--- here
        )

tseylerd · 2025-07-22T10:48:02Z

Hey, I have one question after trying mlx-community/gemma-3n-E2B-it-lm-4bit with your implementation (huge thanks for it). How do you resolve missing chat template in tokenizer_config.json?

xlab · 2025-07-22T16:37:52Z

@davidkoski very interesting, I was confused why this happens until I found the proposal SE-0439 that enables trailing commas in Swift, according to release page of Apple Swift version 6.1.2, it's available in Xcode 16.3+

Anyways, for the sake of compatibility I've removed commas in the latest commit. Cannot test with Xcode 16 but it must be fine now!

davidkoski · 2025-07-22T16:40:08Z

Yeah, I am on 16.3 myself -- the 16.0 CI builder has been very useful :-)

xlab · 2025-07-22T16:40:51Z

@tseylerd I don't have use cases where missing chat template is a problem. If you have an example how it must look like, you can attach here and I'll update the repos on HF with new tokenizer_config.json

davidkoski · 2025-07-22T16:49:29Z

Thank you for the contribution!

xlab and others added 2 commits July 2, 2025 04:01

feat: implement gemma3n text model in MLXLLM

22eb13c

* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF

Merge branch 'main' into gemma3n-lm

410a18d

fix: resolved issue with RMSNorm using MLXArray.mlxNone

c03e5fd

secret-ai-dev mentioned this pull request Jul 19, 2025

Add Gemma3n text model support (rebased from #346) #361

Closed

xlab added 2 commits July 19, 2025 23:27

Merge branch 'main' into gemma3n-lm

2e3105e

chore: run swift-format on Gemma3nText.swift

61b305a

davidkoski approved these changes Jul 21, 2025

View reviewed changes

fix: remove trailing commas, fix build for Xcode 16

9508a47

davidkoski merged commit 505c86f into ml-explore:main Jul 22, 2025
3 checks passed

xlab deleted the gemma3n-lm branch July 22, 2025 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement gemma3n text model in MLXLLM #346

feat: implement gemma3n text model in MLXLLM #346

Uh oh!

xlab commented Jul 2, 2025 •

edited

Loading

Uh oh!

DePasqualeOrg commented Jul 2, 2025

Uh oh!

xlab commented Jul 2, 2025

Uh oh!

davidkoski commented Jul 16, 2025

Uh oh!

xlab commented Jul 20, 2025

Uh oh!

davidkoski left a comment

Uh oh!

davidkoski commented Jul 21, 2025

Uh oh!

tseylerd commented Jul 22, 2025

Uh oh!

xlab commented Jul 22, 2025 •

edited

Loading

Uh oh!

davidkoski commented Jul 22, 2025

Uh oh!

xlab commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

davidkoski commented Jul 22, 2025

Uh oh!

Uh oh!

feat: implement gemma3n text model in MLXLLM #346

feat: implement gemma3n text model in MLXLLM #346

Uh oh!

Conversation

xlab commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Models

Naive benchmarks

Apple M4 Max

iPhone 16 Pro

Notes

Misc

Demos

Uh oh!

DePasqualeOrg commented Jul 2, 2025

Uh oh!

xlab commented Jul 2, 2025

Uh oh!

davidkoski commented Jul 16, 2025

Uh oh!

xlab commented Jul 20, 2025

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

davidkoski commented Jul 21, 2025

Uh oh!

tseylerd commented Jul 22, 2025

Uh oh!

xlab commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidkoski commented Jul 22, 2025

Uh oh!

xlab commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

davidkoski commented Jul 22, 2025

Uh oh!

Uh oh!

xlab commented Jul 2, 2025 •

edited

Loading

xlab commented Jul 22, 2025 •

edited

Loading

xlab commented Jul 22, 2025 •

edited

Loading