Skip to content

Commit 7b09318

Browse files
authored
Minor tweaks to newer models (#50)
* add deepseek-ai path, fix Smol gsm8k value * Qwen3 model on quad, deepseek limit gpu memory * try Qwen3 gpu_memory_utilization * try max-model-len
1 parent 95c3742 commit 7b09318

File tree

5 files changed

+13
-1
lines changed

5 files changed

+13
-1
lines changed

HuggingFaceTB/SmolLM3-3B/accuracy/tasks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@ tasks:
22
- name: gsm8k
33
metrics:
44
- name: exact_match,strict-match
5-
value: 0
5+
value: 0.4708
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
trust-remote-code: true
2+
tensor-parallel-size: 4
3+
max-model-len: 4096
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
max-model-len: 4096
22
tensor-parallel-size: 8
33
trust-remote-code: true
4+
gpu_memory_utilization: 0.8
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
tasks:
2+
- name: gsm8k
3+
metrics:
4+
- name: exact_match,strict-match
5+
value: 0
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# storage configs for https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
2+
model: hf
3+
data: hf

0 commit comments

Comments
 (0)