Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ PDL provides the following features:

The PDL interpreter takes a PDL program as input and generates data by executing its instructions (calling out to models, code, etc...).

See below for a quick reference, followed by [installation notes](#interpreter_installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).
See below for a quick reference, followed by [installation notes](#interpreter-installation) and an [overview](#overview) of the language. A more detailed description of the language features can be found in this [tutorial](https://ibm.github.io/prompt-declaration-language/tutorial).


## Quick Reference
Expand All @@ -50,13 +50,13 @@ pip install 'prompt-declaration-language[examples]'

The Live Explorer can be installed as follows (MacOS):
```
brew install pdl
brew install pdl
```

For other platforms, see installation notes.

You can run PDL with LLM models in local using [Ollama](https://ollama.com), or other cloud service.
See [here](https://ibm.github.io/prompt-declaration-language/tutorial/#using-ollama-models) for
See [here](https://ibm.github.io/prompt-declaration-language/tutorial/#using-ollama-models) for
instructions on how to install an Ollama model locally.

Most examples in this repository use IBM Granite models on [Ollama](https://ollama.com) and some are on [Replicate](https://replicate.com/). In order to run these examples, you need to create a free account
Expand Down Expand Up @@ -172,7 +172,7 @@ text:
temperature: 0
```

Notice the syntactic differences. Model ids on watsonx start with `watsonx`.
Notice the syntactic differences. Model ids on watsonx start with `watsonx`.

Watsonx also provides a text completion endpoint as shown in the following example. A text completion endpoint does not take chat
templates into account:
Expand Down Expand Up @@ -299,10 +299,10 @@ When we execute this program with the PDL interpreter, we obtain the following t
@SuppressWarnings("unchecked")
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
Map<String, String> offsetMap;
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
offsetMap = new HashMap<>();
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
offsetMap = new HashMap<>();
} else {
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
}
return offsetMap;
}
Expand Down Expand Up @@ -364,10 +364,10 @@ When we execute this new program, we obtain the following:
@SuppressWarnings("unchecked")
public static Map<String, String> deserializeOffsetMap(String lastSourceOffset) throws IOException {
Map<String, String> offsetMap;
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
offsetMap = new HashMap<>();
if (lastSourceOffset == null || lastSourceOffset.isEmpty()) {
offsetMap = new HashMap<>();
} else {
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
offsetMap = JSON_MAPPER.readValue(lastSourceOffset, Map.class);
}
return offsetMap;
}
Expand Down
146 changes: 110 additions & 36 deletions docs/autopdl.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,15 @@ hide:

# AutoPDL Tutorial

The following sections show how to use the AutoPDL optimizer to produce optimized PDL programs for specific tasks.
The following sections show how to use the AutoPDL optimizer introduced by [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR) in "AutoPDL: Automatic Prompt Optimization for LLM Agents" ([arXiv](https://arxiv.org/abs/2504.04365)), to produce optimized PDL programs for specific tasks. Please ensure PDL was installed with extras e.g.

``` { .bash .copy .annotate linenums="1" }
pip install 'prompt-declaration-language[all]'
# or from source
git clone [email protected]:IBM/prompt-declaration-language.git
cd prompt-declaration-language
pip install -e '.[all]'
```

To optimize a PDL program, we need the program, an optimizer configuration, a dataset, and an _evaluator_. An evaluator is a Python subclass of `OptimizerEvaluator` that evaluates a candidate, which is a generated configuration instance consisting of e.g. fewshot examples. The evaluator class follows this structure:

Expand Down Expand Up @@ -52,41 +60,15 @@ class OptimizerEvaluator(Thread):

Let's go through an example for `GSM8K`. Our PDL program uses different prompt patterns from the prompt library, and the variables `prompt_pattern`, `question`, `model`, and `demonstrations` are inserted at runtime by the evaluator.


```yaml title="examples/optimizer/gsm8k.pdl" linenums="1"
--8<-- "./examples/optimizer/gsm8k.pdl"
```

We write a configuration file for the optimizer, see `src/pdl/optimize/config_parser.py` for all fields:

``` { .yaml .copy .annotate title="gsm8k_optimizer_config.yml" linenums="1" }
benchmark: gsm8k # Name our benchmark
budget: null # Set a budget, can be number of iterations, or a duration string e.g. "2h"
budget_growth: double # double validation set size each iteration
# or to_max: reach max_test_set_size by final iteration
initial_test_set_size: 2 # size of test set in first iteration
max_test_set_size: 10 # maximum test set size
num_candidates: 100 # how many candidates to evaluate
num_demonstrations: 5 # how many demonstrations to include per candidate
parallelism: 1 # how many threads to run evaluations across
shuffle_test: false # shuffling of test set
test_set_name: test # name of test set
train_set_name: train # name of train set
validation_set_name: validation # name of validation set
demonstrations_variable_name: demonstrations # variable name to insert demonstrations into
variables: # define discrete options to sample from
model: # set ${ model } variable
- watsonx/meta-llama/llama-3-1-8b-instruct
prompt_pattern: # set ${ prompt_pattern } variable to one of these
- cot
- react
- rewoo
num_demonstrations: # overrides num demonstrations above
- 0
- 3
- 5
```
We write a configuration file for the optimizer, and save it as `gsm8k_optimizer_config.yml`. See `src/pdl/optimize/config_parser.py` for all fields. Please note that this example uses the `watsonx` inference service, so an API key is required, although you can also use a local model or any other inference service.

``` { .yaml .copy .annotate title="examples/optimizer/gsm8k_optimizer_config.yml" linenums="1" }
--8<-- "./examples/optimizer/gsm8k_optimizer_config.yml"
```

```python title="examples/optimizer/gsm8k_evaluator.py" linenums="1"
--8<-- "./examples/optimizer/gsm8k_evaluator.py"
Expand All @@ -95,20 +77,112 @@ variables: # define discrete options to sample from
We can see an example of a script to run the optimization process in `examples/optimizer/optimize.py`.
Usage:

```
```text
python optimize.py optimize -h
usage: optimize.py optimize [-h] --config CONFIG --dataset-path DATASET_PATH [--experiments-path EXPERIMENTS_PATH]
[--yield_output | --no-yield_output] [--dry | --no-dry]
pdl_file
```

We also need a dataset to optimize against, with `train`, `test`, and `validation` splits. To produce such a dataset, we can use HuggingFace Datasets `load_dataset` and `save_to_disk`. This example requires the dataset to have columns `question`, `reasoning`, and `answer`, which can be created from the original `openai/gsm8k` dataset. Processing scripts are under development and will follow shortly.
We also need a dataset to optimize against, with `train`, `test`, and `validation` splits. To produce such a dataset, we can use HuggingFace Datasets `load_dataset` and `save_to_disk`. This example requires the dataset to have columns `question`, `reasoning`, and `answer`, which can be created from the original `openai/gsm8k` dataset.

We provide three scripts in `examples/optimizer` to create datasets, including the rule based agentic trajectories. These are `process_gsm8k.py`, `process_fever.py`, and `process_mbpp.py`. They load the original datasets, process them, and save them to disk in the required format. Dataset specific instructions may be found in the respective script files. Note that the scripts create a folder named `var` in the current directory, which contains the processed dataset in a format that can be used by the optimizer. Therefore, they should be run in the root of the PDL repository.

We can run an example like so:
Let's run the GSM8K dataset processing script:

``` { .bash .copy .annotate linenums="1" }
python examples/optimizer/process_gsm8k.py
```

Which should save the processed dataset in `var/gsm8k_trajectified` and output something like:

```text
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 557195.73 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 363559.64 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 271472.56 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 71242.31 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 68826.30 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 22520.85 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 18186.53 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 698328.77 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 232468.57 examples/s]
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 413375.10 examples/s]
DatasetDict({
train: Dataset({
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part', 'traj_keys', 'traj_values', 'rewoo_traj_keys', 'rewoo_traj_values'],
num_rows: 6449
})
test: Dataset({
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'],
num_rows: 1319
})
validation: Dataset({
features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'],
num_rows: 1024
})
})
```

Finally, we can run the example like so:

``` { .bash .copy .annotate linenums="1" }
cd examples/optimizer
python optimize.py optimize --config config.yml --dataset-path datasets/gsm8k gsm8k.pdl
python optimize.py optimize --config gsm8k_optimizer_config.yml --dataset-path ../../var/gsm8k_trajectified gsm8k.pdl
```

This will report details about the optimization process, such as the number of candidates evaluated. The output will look something like this:

```text
PDL Optimizer pdl_optimizer.py:336
┌──────────────────────────────┬─────────────────────────────────────────────┐
│ Config combinations │ 9 │
│ Max candidates │ 100 │
│ Num. candidates │ 100 │
│ Starting validation set size │ 2 │
│ Max validation set size │ 10 │
│ Num. iterations │ 7 │
│ Total evaluations │ 1,200 │
│ Num. threads │ 1 │
│ Validation set multiplier │ 2 │
│ Shuffle validation set │ False │
│ Budget policy │ None │
├──────────────────────────────┼─────────────────────────────────────────────┤
│ model │ ['watsonx/meta-llama/llama-3-2-3b-instruct… │
│ prompt_pattern │ ['cot', 'react', 'rewoo'] │
│ num_demonstrations │ [0, 3, 5] │
└──────────────────────────────┴─────────────────────────────────────────────┘
Iteration pdl_optimizer.py:419
┌─────────────────────┬─────┐
│ Index │ 0 │
│ Validation set size │ 2 │
│ Num. candidates │ 100 │
└─────────────────────┴─────┘
Evaluation pdl_optimizer.py:601
┌────────────────────────┬──────────────────────────────────────────┐
│ Test set size │ 2 │
├────────────────────────┼──────────────────────────────────────────┤
│ model │ watsonx/meta-llama/llama-3-2-3b-instruct │
│ prompt_pattern │ cot │
│ num_demonstrations │ 0 │
│ uuid │ enl0ertp │
│ demonstrations_indices │ 0 │
│ demonstrations │ 0 │
└────────────────────────┴──────────────────────────────────────────┘
Running without parallelism util.py:74
0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1,200 [ 0:00:01 < -:--:-- , ? it/s ]
```

Once the process is complete, a file `optimized_gsm8k.pdl` is written. This file contains the optimal configuration and is directly executable by the standard PDL interpreter.
Note that it is not unusual to observe PDL exceptions during the optimization process.

```text
[15:44:14] Type errors during spec checking:
../../contrib/prompt_library/ReAct.pdl:0 - should be an object
../../contrib/prompt_library/ReAct.pdl:0 - Type errors during spec checking:
../../contrib/prompt_library/ReAct.pdl:0 - should be an object
Retrying: False
Runtime FAILED and took seconds: 10.21
```

Such exceptions, here for example in `ReAct.pdl`, are caused by the _typed_ model call in `ReAct.pdl:98`. If the model output does not result in a parsable JSON that matches the expected type `{ name: string, arguments: object }`, the PDL interpreter raises an exception.

Once the process is complete, a file `optimized_gsm8k.pdl` is written in same directory as the source PDL file. This file contains the optimal configuration and is directly executable by the standard PDL interpreter. A log of the optimization process is written to `experiments/` by default.
25 changes: 25 additions & 0 deletions examples/optimizer/gsm8k_optimizer_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
benchmark: gsm8k # Name our benchmark
budget: null # Set a budget, can be number of iterations, or a duration string e.g. "2h"
budget_growth: double # double validation set size each iteration
# or to_max: reach max_test_set_size by final iteration
initial_test_set_size: 2 # size of test set in first iteration
max_test_set_size: 10 # maximum test set size
num_candidates: 100 # how many candidates to evaluate
num_demonstrations: 5 # how many demonstrations to include per candidate
parallelism: 1 # how many threads to run evaluations across
shuffle_test: false # shuffling of test set
test_set_name: test # name of test set
train_set_name: train # name of train set
validation_set_name: validation # name of validation set
demonstrations_variable_name: demonstrations # variable name to insert demonstrations into
variables: # define discrete options to sample from
model: # set ${ model } variable
- watsonx/meta-llama/llama-3-2-3b-instruct
prompt_pattern: # set ${ prompt_pattern } variable to one of these
- cot
- react
- rewoo
num_demonstrations: # overrides num demonstrations above
- 0
- 3
- 5
2 changes: 1 addition & 1 deletion examples/optimizer/mbpp_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from copy import deepcopy

from datasets import load_from_disk
from datasets.load import load_from_disk
from evalplus.data import get_mbpp_plus, get_mbpp_plus_hash
from evalplus.evaluate import MBPP_OUTPUT_NOT_NONE_TASKS, get_groundtruth

Expand Down
2 changes: 1 addition & 1 deletion examples/optimizer/optimize.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import Any

import yaml
from datasets import load_from_disk
from datasets.load import load_from_disk
from fever_evaluator import FEVEREvaluator
from gsm8k_evaluator import Gsm8kEvaluator
from gsmhard_evaluator import GsmHardEvaluator
Expand Down
Loading
Loading