🏭 TRL Jobs

TRL Jobs is a simple wrapper around Hugging Face Jobs that makes it easy to run TRL (Transformer Reinforcement Learning) workflows directly on 🤗 Hugging Face infrastructure.

Think of it as the quickest way to kick off Supervised Fine-Tuning (SFT) and more, without worrying about all the boilerplate setup.

📦 Installation

Get started with a single command:

pip install trl-jobs

⚡ Quick Start

Run your first supervised fine-tuning job in just one line:

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara

The training is tracked with Trackio and the fine-tuned model is automatically pushed to the 🤗 Hub.

🛠 Available Commands

Right now, SFT (Supervised Fine-Tuning) is supported. More workflows will be added soon!

🔹 SFT (Supervised Fine-Tuning)

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara

Required arguments

--model_name → Model to fine-tune (e.g. Qwen/Qwen3-0.6B)
--dataset_name → Dataset to train on (e.g. trl-lib/Capybara)

Optional arguments

--peft → Use PEFT (LoRA) (default: False)
--flavor → Hardware flavor (default: a100-large, only option for now)
--timeout → Max runtime (1h by default). Supports s, m, h, d
-d, --detach → Run in background and print job ID
--namespace → Namespace where the job will run (default: your user namespace)
--token → Hugging Face token (only needed if not logged in)

➡️ You can also pass any arguments supported by trl sft. E.g.

trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara --learning_rate 3e-5

For the full list, see the TRL CLI docs.

Dataset format

SFT supports various 4 dataset formats.

Standard language modeling
```
example = {"text": "The sky is blue."}
```

Standard prompt-completion

example = {"prompt": "The sky is", "completion": " blue."}

Conversationanl language modeling

example = {"messages": [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."}
]}

Conversational prompt-completion

example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
           "completion": [{"role": "assistant", "content": "It is blue."}]}

Important

When using conversational dataset, ensure that the model has a chat template.

Note

When using prompt-completion dataset, the loss is only computed on the completion part.

For more details, see the TRL docs - Dataset formats.

📊 Supported Configurations

Here are some ready-to-go setups you can use out of the box.

🦙 Meta LLaMA 3

Model	Max context length	Tokens / batch	Example command
Meta-Llama-3-8B	4096	262,144	`trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --dataset_name ...`
Meta-Llama-3-8B-Instruct	4096	262,144	`trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --dataset_name ...`

🦙 Meta LLaMA 3 with PEFT

Model	Max context length	Tokens / batch	Example command
Meta-Llama-3-8B	24,576	196,608	`trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --peft --dataset_name ...`
Meta-Llama-3-8B-Instruct	24,576	196,608	`trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --peft --dataset_name ...`

🐧 Qwen3

Model	Max context length	Tokens / batch	Example command
Qwen3-0.6B-Base	32,768	65,536	`trl-jobs sft --model_name Qwen/Qwen3-0.6B-Base --dataset_name ...`
Qwen3-0.6B	32,768	65,536	`trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name ...`
Qwen3-1.7B-Base	24,576	98,304	`trl-jobs sft --model_name Qwen/Qwen3-1.7B-Base --dataset_name ...`
Qwen3-1.7B	24,576	98,304	`trl-jobs sft --model_name Qwen/Qwen3-1.7B --dataset_name ...`
Qwen3-4B-Base	20,480	163,840	`trl-jobs sft --model_name Qwen/Qwen3-4B-Base --dataset_name ...`
Qwen3-4B	20,480	163,840	`trl-jobs sft --model_name Qwen/Qwen3-4B --dataset_name ...`
Qwen3-8B-Base	4,096	262,144	`trl-jobs sft --model_name Qwen/Qwen3-8B-Base --dataset_name ...`
Qwen3-8B	4,096	262,144	`trl-jobs sft --model_name Qwen/Qwen3-8B --dataset_name ...`

🐧 Qwen3 with PEFT

Model	Max context length	Tokens / batch	Example command
Qwen3-8B-Base	24,576	196,608	`trl-jobs sft --model_name Qwen/Qwen3-8B-Base --peft --dataset_name ...`
Qwen3-8B	24,576	196,608	`trl-jobs sft --model_name Qwen/Qwen3-8B --peft --dataset_name ...`
Qwen3-14B-Base	20,480	163,840	`trl-jobs sft --model_name Qwen/Qwen3-14B-Base --peft --dataset_name ...`
Qwen3-14B	20,480	163,840	`trl-jobs sft --model_name Qwen/Qwen3-14B --peft --dataset_name ...`
Qwen3-32B	4,096	131,072	`trl-jobs sft --model_name Qwen/Qwen3-32B --peft --dataset_name ...`

SmolLM3

Model	Max context length	Tokens / batch	Example command
HuggingFaceTB/SmolLM3-3B-Base	28,672	114,688	`trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...`
HuggingFaceTB/SmolLM3-3B	28,672	114,688	`trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...`

🤖 OpenAI GPT-OSS (with PEFT)

🚧 Coming soon!

💡 Want support for another model?

Open an issue or submit a PR—we’d love to hear from you!

🔑 Authentication

You’ll need a Hugging Face token to run jobs. You can provide it in any of these ways:

Login with huggingface-cli login
Set the environment variable HF_TOKEN
Pass it directly with --token

📜 License

This project is under the MIT License. See the LICENSE file for details.

🤝 Contributing

We welcome contributions! Please open an issue or a PR on GitHub.

Before committing, run formatting checks:

ruff check . --fix && ruff format . --line-length 119

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
trl_jobs		trl_jobs
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏭 TRL Jobs

📦 Installation

⚡ Quick Start

🛠 Available Commands

🔹 SFT (Supervised Fine-Tuning)

Required arguments

Optional arguments

Dataset format

📊 Supported Configurations

🦙 Meta LLaMA 3

🦙 Meta LLaMA 3 with PEFT

🐧 Qwen3

🐧 Qwen3 with PEFT

🤖 OpenAI GPT-OSS (with PEFT)

💡 Want support for another model?

🔑 Authentication

📜 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

huggingface/trl-jobs

Folders and files

Latest commit

History

Repository files navigation

🏭 TRL Jobs

📦 Installation

⚡ Quick Start

🛠 Available Commands

🔹 SFT (Supervised Fine-Tuning)

Required arguments

Optional arguments

Dataset format

📊 Supported Configurations

🦙 Meta LLaMA 3

🦙 Meta LLaMA 3 with PEFT

🐧 Qwen3

🐧 Qwen3 with PEFT

🤖 OpenAI GPT-OSS (with PEFT)

💡 Want support for another model?

🔑 Authentication

📜 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages