Skip to content

Commit 23ff243

Browse files
Merge branch 'main' into kv-cache
2 parents 25c8120 + c8c7003 commit 23ff243

File tree

19 files changed

+1087
-19
lines changed

19 files changed

+1087
-19
lines changed

Applications/MLXChatExample/Services/MLXService.swift

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ class MLXService {
2929
name: "qwen2.5VL:3b", configuration: VLMRegistry.qwen2_5VL3BInstruct4Bit, type: .vlm),
3030
LMModel(name: "qwen2VL:2b", configuration: VLMRegistry.qwen2VL2BInstruct4Bit, type: .vlm),
3131
LMModel(name: "smolVLM", configuration: VLMRegistry.smolvlminstruct4bit, type: .vlm),
32+
LMModel(name: "acereason:7B", configuration: LLMRegistry.acereason_7b_4bit, type: .llm),
3233
]
3334

3435
/// Cache to store loaded model containers to avoid reloading.

Libraries/MLXLLM/Documentation.docc/Documentation.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,24 @@ Example implementations of various Large Language Models (LLMs).
1111
- [MLXVLM](MLXVLM)
1212
- [StableDiffusion](StableDiffusion)
1313

14+
## Quick Start
15+
16+
See <doc:evaluation>.
17+
18+
Using LLMs and VLMs is as easy as this:
19+
20+
```swift
21+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
22+
let session = ChatSession(model)
23+
print(try await session.respond(to: "What are two things to see in San Francisco?")
24+
print(try await session.respond(to: "How about a great place to eat?")
25+
```
26+
27+
More advanced APIs are available for those that need them, see <doc:using-model>.
28+
1429
## Topics
1530

31+
- <doc:evaluation>
1632
- <doc:adding-model>
1733
- <doc:using-model>
1834

@@ -32,3 +48,4 @@ Example implementations of various Large Language Models (LLMs).
3248
- ``Starcoder2Model``
3349
- ``MiMoModel``
3450
- ``GLM4Model``
51+
- ``AceReason``
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Evaluation
2+
3+
The simplified LLM/VLM API allows you to load a model and evaluate prompts with only a few lines of code.
4+
5+
For example, this loads a model and asks a question and a follow-on question:
6+
7+
```swift
8+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
9+
let session = ChatSession(model)
10+
print(try await session.respond(to: "What are two things to see in San Francisco?")
11+
print(try await session.respond(to: "How about a great place to eat?")
12+
```
13+
14+
The second question actually refers to information (the location) from the first
15+
question -- this context is maintained inside the ``ChatSession`` object.
16+
17+
If you need a one-shot prompt/response simply create a ``ChatSession``, evaluate
18+
the prompt and discard. Multiple ``ChatSession`` instances could also be used
19+
(at the cost of the memory in the `KVCache`) to handle multiple streams of
20+
context.
21+
22+
## Streaming Output
23+
24+
The previous example produced the entire response in one call. Often
25+
users want to see the text as it is generated -- you can do this with
26+
a stream:
27+
28+
```swift
29+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
30+
let session = ChatSession(model)
31+
32+
for try await item in session.streamResponse(to: "Why is the sky blue?") {
33+
print(item, terminator: "")
34+
}
35+
print()
36+
```
37+
38+
## VLMs (Vision Language Models)
39+
40+
This same API supports VLMs as well. Simply present the image or video
41+
to the ``ChatSession``:
42+
43+
```swift
44+
let model = try await loadModel(id: "mlx-community/Qwen2.5-VL-3B-Instruct-4bit")
45+
let session = ChatSession(model)
46+
47+
let answer1 = try await session.respond(
48+
to: "what kind of creature is in the picture?"
49+
image: .url(URL(fileURLWithPath: "support/test.jpg"))
50+
)
51+
print(answer1)
52+
53+
// we can ask a followup question referring back to the previous image
54+
let answer2 = try await session.respond(
55+
to: "What is behind the dog?"
56+
)
57+
print(answer2)
58+
```
59+
60+
## Advanced Usage
61+
62+
The ``ChatSession`` has a number of parameters you can supply when creating it:
63+
64+
- **instructions**: optional instructions to the chat session, e.g. describing what type of responses to give
65+
- for example you might instruct the language model to respond in rhyme or
66+
talking like a famous character from a movie
67+
- or that the responses should be very brief
68+
- **generateParameters**: parameters that control the generation of output, e.g. token limits and temperature
69+
- see ``GenerateParameters``
70+
- **processing**: optional media processing instructions

Libraries/MLXLLM/Documentation.docc/using-model.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Using a model is easy: load the weights, tokenize and evaluate.
44

5+
There is a high level API described in <doc:evaluation> and this documentation
6+
describes the lower level API if you need more control.
7+
58
## Loading a Model
69

710
A model is typically loaded by using a `ModelFactory` and a `ModelConfiguration`:

Libraries/MLXLLM/LLMModelFactory.swift

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ public class LLMTypeRegistry: ModelTypeRegistry, @unchecked Sendable {
4545
"granite": create(GraniteConfiguration.self, GraniteModel.init),
4646
"mimo": create(MiMoConfiguration.self, MiMoModel.init),
4747
"glm4": create(GLM4Configuration.self, GLM4Model.init),
48+
"acereason": create(Qwen2Configuration.self, Qwen2Model.init),
4849
]
4950
}
5051

@@ -211,6 +212,11 @@ public class LLMRegistry: AbstractModelRegistry, @unchecked Sendable {
211212
defaultPrompt: "Why is the sky blue?"
212213
)
213214

215+
static public let acereason_7b_4bit = ModelConfiguration(
216+
id: "mlx-community/AceReason-Nemotron-7B-4bit",
217+
defaultPrompt: ""
218+
)
219+
214220
private static func all() -> [ModelConfiguration] {
215221
[
216222
codeLlama13b4bit,
@@ -240,6 +246,7 @@ public class LLMRegistry: AbstractModelRegistry, @unchecked Sendable {
240246
smolLM_135M_4bit,
241247
mimo_7b_sft_4bit,
242248
glm4_9b_4bit,
249+
acereason_7b_4bit,
243250
]
244251
}
245252

@@ -312,7 +319,7 @@ public class LLMModelFactory: ModelFactory {
312319
public func _load(
313320
hub: HubApi, configuration: ModelConfiguration,
314321
progressHandler: @Sendable @escaping (Progress) -> Void
315-
) async throws -> ModelContext {
322+
) async throws -> sending ModelContext {
316323
// download weights and config
317324
let modelDirectory = try await downloadModel(
318325
hub: hub, configuration: configuration, progressHandler: progressHandler)
@@ -361,3 +368,9 @@ public class LLMModelFactory: ModelFactory {
361368
}
362369

363370
}
371+
372+
public class TrampolineModelFactory: NSObject, ModelFactoryTrampoline {
373+
public static func modelFactory() -> (any MLXLMCommon.ModelFactory)? {
374+
LLMModelFactory.shared
375+
}
376+
}

Libraries/MLXLLM/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,26 @@ Currently supported model types are:
5858
- Starcoder2
5959
- MiMo
6060
- GLM4
61+
- AceReason
6162

6263
See [llm-tool](../../Tools/llm-tool)
6364

65+
# Quick Start
66+
67+
Using LLMs and VLMs from MLXLMCommon is as easy as:
68+
69+
```swift
70+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
71+
let session = ChatSession(model)
72+
print(try await session.respond(to: "What are two things to see in San Francisco?")
73+
print(try await session.respond(to: "How about a great place to eat?")
74+
```
75+
76+
For more information see
77+
[Evaluation](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/evaluation)
78+
or [Using Models](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/using-model)
79+
for more advanced API.
80+
6481
# Adding a Model
6582

6683
If the model follows the typical LLM pattern:

0 commit comments

Comments
 (0)