Skip to content

epam/ai-dial-adapter-openai

OpenAI Adapter

Overview

The project implements AI DIAL API for language models from Azure OpenAI.

Developer environment

This project uses Python>=3.11 and Poetry>=2.1.1 as a dependency manager.

Check out Poetry's documentation on how to install it on your system before proceeding.

To install requirements:

poetry install

This will install all requirements for running the package, linting, formatting and tests.

IDE configuration

The recommended IDE is VS Code. Open the project in VS Code and install the recommended extensions.

VS Code is configured to use PEP-8 compatible formatter Black.

Alternatively you can use PyCharm.

Set up the Black in PyCharm manually or install PyCharm>=2023.2 with built-in Black support.

Run

Run the development server locally:

make serve

Run the server from a Docker container:

make docker_serve

Make on Windows

As of now, Windows distributions do not include the make tool. To run make commands, the tool can be installed using the following command (since Windows 10):

winget install GnuWin32.Make

For convenience, the tool folder can be added to the PATH environment variable as C:\Program Files (x86)\GnuWin32\bin. The command definitions inside Makefile should be cross-platform to keep the development environment setup simple.

Chat completions deployments

The adapter is able to convert certain upstream APIs to the DIAL Chat Completions API (which is an extension of Azure OpenAI Chat Completions API).

Chat Completions deployments are exposed via the endpoint:

POST ${ADAPTER_ORIGIN}/openai/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions

Supported upstream chat APIs

Azure OpenAI Chat Completions API (Last generation API)

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/deployments/${AZURE_OPENAI_DEPLOYMENT_ID}/chat/completions",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

There are three free variables in the config related to deployment ids. Each of these variables corresponds to an HTTP request initiated by the DIAL client:

  1. DIAL_DEPLOYMENT_ID - it's the deployment id visible to the DIAL Client via DIAL deployment listing. The client will be using the id to call the model by sending the request POST ${DIAL_CORE_ORIGIN}/openai/deployments/${DIAL_DEPLOYMENT_ID}/chat/completions
  2. ADAPTER_DEPLOYMENT_ID - the deployment id the OpenAI adapter receives when DIAL Core calls POST ${ADAPTER_ORIGIN}/openai/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions. Use this identifier in environment variables that define deployment categories.
  3. AZURE_OPENAI_DEPLOYMENT_ID - the Azure OpenAI deployment called by the OpenAI adapter.
sequenceDiagram
    autonumber
    actor U as DIAL Client
    participant C as DIAL Core
    participant A as OpenAI Adapter
    participant AZ as Azure OpenAI
    participant OP as OpenAI Platform

    Note over U,C: DIAL_DEPLOYMENT_ID
    U->>C: POST /openai/deployments/<br>${DIAL_DEPLOYMENT_ID}/chat/completions

    Note over C,A: ADAPTER_DEPLOYMENT_ID
    C->>A: POST ${ADAPTER_ORIGIN}/openai/deployments/<br>${ADAPTER_DEPLOYMENT_ID}/chat/completions

    alt Azure OpenAI upstream
        Note over A,AZ: AZURE_OPENAI_DEPLOYMENT_ID
        A->>AZ: POST https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/<br>openai/deployments/${AZURE_OPENAI_DEPLOYMENT_ID}/<br>chat/completions
        Note right of A: Auth: api-key (if provided) or Azure AD via DefaultAzureCredential
        AZ-->>A: JSON or SSE stream
    else OpenAI Platform upstream
        A->>OP: POST https://api.openai.com/v1/chat/completions<br>(with "model"=${OPENAI_MODEL_NAME}, api-key)
        OP-->>A: JSON or SSE stream
    end

    A-->>C: Normalized response (headers/stream)
    C-->>U: Response to client
Loading

Typically these three variables share the same value (the Azure OpenAI deployment name). They may differ if you expose multiple DIAL deployments that call the same Azure OpenAI endpoint but configured differently.

The DefaultAzureCredential is used to authenticate requests to Azure when an API key is not provided in the upstream configuration.

Azure OpenAI Chat Completions API (Next generation API)

The Next generation API (aka v1 API) doesn't include the deployment id in the URL:

  • Last generation API: POST https://SERVICE_NAME.openai.azure.com/openai/deployments/gpt-4o/chat/completions
  • Next generation API: POST https://SERVICE_NAME.openai.azure.com/openai/v1/chat/completions

The DIAL configuration changes accordingly:

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${AZURE_OPENAI_DEPLOYMENT_ID}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/v1/chat/completions",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

Because the deployment ID is not included in the upstream URL, specify it in the overrideName field. If this field is missing, the model name takes the value of the model field from the original chat completion request (if present), otherwise ${ADAPTER_DEPLOYMENT_ID}.

OpenAI Platform Chat Completions API

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${OPENAI_MODEL_NAME}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://api.openai.com/v1/chat/completions",
          "key": "${API_KEY}"
        }
      ]
    }
  }
}

Note the difference from the Azure OpenAI configuration:

  • The API key is required.
  • Added overrideName to specify the upstream OpenAI model name. The upstream URL does not include the model name (unlike Azure), so we pass it via overrideName. If this field is missing, the model name takes the value of the model field from the original chat completion request (if present), otherwise ${ADAPTER_DEPLOYMENT_ID}.

Azure OpenAI Responses API (Next generation API)

Certain advanced features of OpenAI models, such as reasoning summary, are only accessible via Responses API and not accessible via Chat Completions API.

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${AZURE_OPENAI_DEPLOYMENT_ID}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/v1/responses",
          "key": "${API_KEY}"
        }
      ]
    }
  }
}

As in other cases where the upstream URL omits a deployment id, specify it in the overrideName field.

The last generation API is also supported via an URLs in the following format:

"endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/responses"

Azure AI Foundry Chat Completions API

Certain LLM models like gpt-oss-120b or Mistral-Large-2411 can only be deployed to an Azure AI Foundry service. They are accessible via:

  • Azure AI model inference endpoint or
  • Azure OpenAI endpoint
DIAL Core Config (Azure AI model inference endpoint)
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${AZURE_AI_FOUNDRY_DEPLOYMENT_ID}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_AI_FOUNDRY_SERVICE_NAME}.services.ai.azure.com/models/chat/completions",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}
DIAL Core Config (Azure OpenAI endpoint)
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${AZURE_AI_FOUNDRY_DEPLOYMENT_ID}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_AI_FOUNDRY_SERVICE_NAME}.openai.azure.com/openai/deployments/gpt-oss-120b/chat/completions",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}
DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/deployments/${AZURE_OPENAI_DEPLOYMENT_ID}/images/generations",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

The supported upstream models are dall-e-3 and gpt-image-1. These are the values that AZURE_OPENAI_DEPLOYMENT_ID variable can take.

Important

The DALL·E 3 adapter deployment must be declared in DALLE3_DEPLOYMENTS env variable, and GPT-Image 1 deployment - in GPT_IMAGE_1_DEPLOYMENTS.

OpenAI Completions API

The adapter also supports legacy Completions API both for Azure-style upstream endpoints and OpenAI Platform-style endpoints:

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "overrideName": "${OPENAI_MODEL_NAME}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://api.openai.com/v1/completions",
          "key": "${API_KEY}"
        }
      ]
    }
  }
}

Tokenization of chat completion requests/responses

The adapter guarantees that all chat completion responses include token-usage information (the number of prompt and completion tokens consumed).

However, by default neither Azure OpenAI nor OpenAI Platform returns token usage for streaming requests (those with stream=true).

Therefore, the adapter tokenizes both the request and the response when the upstream doesn’t provide usage. Adapter-side tokenization is also required when the request includes max_prompt_tokens - the maximum number of tokens to which the incoming request is truncated before being sent upstream.

How to minimize adapter-side tokenization

The tokenization algorithm is CPU-heavy and may throttle requests under high load. Therefore, it’s important to minimize cases where tokenization is required.

Azure OpenAI and OpenAI Platform return token usage for streaming requests when the include_usage option is enabled in the chat completion request. We recommend setting this option in the DIAL Core configuration via the defaults field to reduce the adapter’s CPU usage:

{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "chat",
      "endpoint": "...",
      "upstreams": ["..."],
      "defaults": {
        "stream_options": {
          "include_usage": true
        }
      }
    }
  }
}

Tokenization algorithm

How does the adapter know which deployment requires which tokenization algorithm?

The adapter does not perform tokenization for:

  1. deployments registered in DATABRICKS_DEPLOYMENTS and MISTRAL_DEPLOYMENTS env vars. It's expected upstream for these deployments are going to return the token usage.
  2. deployments supported by the following APIs:
    1. legacy Completions API
    2. Images API
    3. Responses API

For other deployments, tokenization is determined as follows.

Important

Adapter-side tokenization of documents, audio, and video files isn’t currently supported. Such multimodal content is counted as zero tokens.

Text tokenization

The adapter is using the tiktoken library as a tokenizer for OpenAI models.

TIKTOKEN_MODEL_MAPPING env variable defines a mapping from adapter deployment ids to the model identifies which are know to tiktoken.

If the adapter deployment id could not be resolved by tiktoken, then the adapter throws an internal server error explaining the issue.

Image tokenization

If a deployment is registered in GPT4O_DEPLOYMENTS or GPT4O_MINI_DEPLOYMENTS, the corresponding image-tokenization algorithm described in the Azure documentation is used.

Otherwise, images aren’t tokenized — the image tokens are assumed to be 0.

Embedding deployments

The adapter is able to convert certain upstream APIs to the DIAL Embeddings API (which is an extension of Azure OpenAI Embeddings API).

Embeddings deployments are exposed via the endpoint:

POST ${ADAPTER_ORIGIN}/openai/deployments/${ADAPTER_DEPLOYMENT_ID}/embeddings

Supported upstream embedding APIs

Azure OpenAI Embeddings API (Last generation API)

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "embedding",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/embeddings",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/deployments/${AZURE_OPENAI_DEPLOYMENT_ID}/embeddings",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

Azure OpenAI Embeddings API (Next generation API)

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "embedding",
      "overrideName": "${AZURE_OPENAI_DEPLOYMENT_ID}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/embeddings",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_OPENAI_SERVICE_NAME}.openai.azure.com/openai/v1/embeddings",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

OpenAI Platform Embeddings API

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "embedding",
      "overrideName": "${OPENAI_MODEL_NAME}",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/embeddings",
      "upstreams": [
        {
          "endpoint": "https://api.openai.com/v1/embeddings",
          "key": "${API_KEY}"
        }
      ]
    }
  }
}

Azure multimodal embeddings

The adapter supports Azure Multimodal embeddings.

DIAL Core Config
{
  "models": {
    "${DIAL_DEPLOYMENT_ID}": {
      "type": "embedding",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/${ADAPTER_DEPLOYMENT_ID}/embeddings",
      "upstreams": [
        {
          "endpoint": "https://${COMPUTER_VISION_SERVICE_NAME}.cognitiveservices.azure.com",
          "key": "${OPTIONAL_API_KEY}"
        }
      ]
    }
  }
}

Important

${ADAPTER_DEPLOYMENT_ID} must be added to the env variable AZURE_AI_VISION_DEPLOYMENTS to enable the embeddings deployment.

The multimodal embeddings model supports text and images as inputs.

Since the original OpenAI embeddings API only support text inputs, the image inputs should be passed in the custom_input request field as URL or in base64-encoded format:

curl -X POST "${DIAL_CORE_ORIGIN}/deployments/${DIAL_DEPLOYMENT_ID}/embeddings" -v \
  -H "api-key:${DIAL_API_KEY}" \
  -H "content-type:application/json" \
  -d '{"input": ["cat", "fish"], "custom_input": [{"type": "image/png", "url": "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"}]}'

The response will contain three embedding vectors, each corresponding to one of the inputs in the original request.

Environment Variables

Copy .env.example to .env and customize it for your environment.

Categories of deployments

The following variables cluster all deployments into the groups of deployments which share the same API and the same tokenization algorithm.

Variable Default Description
DALLE3_DEPLOYMENTS `` Comma-separated list of deployments that support DALL-E 3 API. Example: dall-e-3,dalle3,dall-e
DALLE3_AZURE_API_VERSION 2024-02-01 The API version for requests to the Azure DALL·E 3 API
GPT_IMAGE_1_DEPLOYMENTS `` Comma-separated list of deployments that support GPT-Image 1 API. Example: gpt-image-1
GPT_IMAGE_1_AZURE_API_VERSION 2024-02-01 The API version for requests to the Azure GPT-Image 1 API
MISTRAL_DEPLOYMENTS `` Comma-separated list of deployments that support Mistral Large Azure API. Example: mistral-large-azure,mistral-large
DATABRICKS_DEPLOYMENTS `` Comma-separated list of Databricks chat completion deployments. Example: databricks-dbrx-instruct,databricks-mixtral-8x7b-instruct,databricks-llama-2-70b-chat
GPT4O_DEPLOYMENTS `` Comma-separated list of GPT-4o chat completion deployments. Example: gpt-4o-2024-05-13
GPT4O_MINI_DEPLOYMENTS `` Comma-separated list of GPT-4o mini chat completion deployments. Example: gpt-4o-mini-2024-07-18
AZURE_AI_VISION_DEPLOYMENTS `` Comma-separated list of Azure AI Vision embedding deployments. The endpoint of the deployment is expected to point to the Azure service: https://<service-name>.cognitiveservices.azure.com/

Deployments that do not fall into any of the categories are considered to support text-to-text chat completion OpenAI API or text embeddings OpenAI API.

Other variables

Variable Default Description
LOG_LEVEL INFO Log level. Use DEBUG for dev purposes and INFO in prod
WEB_CONCURRENCY 1 Number of workers for the server
TIKTOKEN_MODEL_MAPPING {} Mapping from the request deployment id to a tiktoken model name. Required when the upstream model does not return usage. Example: {"my-gpt-deployment":"gpt-3.5-turbo","my-gpt-o3-deployment":"o3"}. You don’t need a mapping if the deployment name already matches a tiktoken model (check with python -c "from tiktoken.model import encoding_name_for_model as e; print(e('my-deployment-name'))"). All chat-completion models require tokenization via tiktoken except those declared in DATABRICKS_DEPLOYMENTS, MISTRAL_DEPLOYMENTS, GPT_IMAGE_1_DEPLOYMENTS, and DALLE3_DEPLOYMENTS.
DIAL_USE_FILE_STORAGE False Save image model artifacts to DIAL File storage (DALL-E images are uploaded to the DIAL file storage and its base64 encodings are replaced with links to the storage)
DIAL_URL URL of the core DIAL server (required when DIAL_USE_FILE_STORAGE=True)
NON_STREAMING_DEPLOYMENTS `` Comma-separated list of deployments that do not support streaming. The adapter will emulate streaming by calling the model and converting its response into a single-chunk stream. Example: "o1-mini,o1-preview"
ACCESS_TOKEN_EXPIRATION_WINDOW 10 The Azure access token is renewed this many seconds before its actual expiration time. The buffer ensures that the token does not expire in the middle of an operation due to processing time and potential network delays.
AZURE_OPEN_AI_SCOPE Provided scope of access token to Azure OpenAI services. Default: https://cognitiveservices.azure.com/.default
API_VERSIONS_MAPPING {} Mapping of API versions for requests to the Azure OpenAI API. Example: {"2023-03-15-preview": "2023-05-15", "": "2024-02-15-preview"}. An empty key sets the default API version when the user does not pass one in the request.
ELIMINATE_EMPTY_CHOICES False When enabled, the response stream is guaranteed to exclude chunks with an empty list of choices. This is useful when a DIAL client doesn't support such chunks. An empty list of choices can be generated by Azure OpenAI in at least two cases: (1) when the Content filter is not disabled, Azure includes prompt filter results in the first chunk with an empty list of choices; (2) when stream_options.include_usage is enabled, the last chunk contains usage data and an empty list of choices.
THREAD_POOL_SIZE The size of a thread pool for CPU-heavy tasks such as tokenization and image analysis. The default is min(32, (os.cpu_count() or 1) + 4).

Configurable models

Certain models support configuration via the $ADAPTER_ORIGIN/openai/deployments/$DEPLOYMENT_NAME/configuration endpoint.

GET request to this endpoint returns the schema of the model configuration in JSON Schema format.

Such models expect the custom_fields.configuration field of the chat/completions request to contain a JSON value conforming to that schema. The custom_fields.configuration field is optional if and only if every field in the schema is also optional.

The configuration can be preset in the DIAL Core config via the defaults parameter:

DIAL Core Config
{
  "models": {
    "my-deployment-id": {
      "type": "chat",
      "endpoint": "$ADAPTER_ORIGIN/openai/deployments/my-deployment-id/chat/completions",
      "upstreams": [
        {
          "endpoint": "$AZURE_OPENAI_SERVICE_ORIGIN/openai/deployments/openai-deployment-id/chat/completions"
        }
      ],
      "defaults": {
        "custom_fields": {
            "configuration": $MODEL_CONFIGURATION_OBJECT
        }
      }
    }
  }
}

This is convenient when major model features can be enabled via configuration (e.g., web search or reasoning) and you want a deployment where these features are permanently enabled.

DIAL Core will enrich requests with the configuration specified in defaults, so the client doesn’t need to provide it with each chat completion request.

DALL-E / GPT Image 1

OpenAI image generation models accept configurations with parameters specific for image generation such as image size, style, and quality.

The latest supported parameters can be found in the official OpenAI documentation for models capable of image generation or in the Azure OpenAI API documentation.

Alternatively, the configuration schema can be retrieved programmatically from the /configuration endpoint. However, this schema may lag behind the official one (see Forward compatibility).

An example of DALL-E 3 request with configured style and image size:

Request
{
  "model": "dall-e-3",
  "messages": [
    {
      "role": "user",
      "content": "forest meadow"
    }
  ],
  "custom_fields": {
    "configuration": {
      "size": "1024x1024",
      "style": "vivid"
    }
  }
}

Similarly, the configuration could be preset on the per-deployment basis in the DIAL Core config:

DIAL Core Config
{
  "models": {
    "dial-dall-e-3": {
      "type": "chat",
      "description": "...",
      "endpoint": "...",
      "defaults": {
        "custom_fields": {
          "configuration": {
            "size": "1024x1024",
            "style": "vivid"
          }
        }
      }
    }
  }
}

So that the end user doesn't have to attach configuration to each chat completion request. It will be applied automatically (if missing) by the DIAL Core for all incoming requests to this deployment.

Forward compatibility

The configuration schema in the adapter isn't fixed and allows for extra fields and arbitrary parameter values. This enables forward compatibility with the future versions of the image generation API.

Let's say the next version of GPT Image model introduces support of a negative prompt (which isn't currently supported). It still will be possible to use a version of OpenAI adapter that is ignorant of the latest developments in the GPT Image API thanks to the permissive configuration schema.

Request
{
  "model": "gpt-image-1",
  "messages": [
    {
      "role": "user",
      "content": "forest meadow"
    }
  ],
  "custom_fields": {
    "configuration": {
      "negative_prompt": "trees"
    }
  }
}

Models based on Responses API

The Responses API provides more features than Chat Completions API. Some of these features could be enabled via a configuration fields in the chat completions request.

The JSON schema of the configuration is open which enables forward compatibility with the future developments in the Responses API.

Note

Such a configuration is only possible for the models that are configured in the DIAL Core config to use Responses API upstream endpoints.

Reasoning configuration

The reasoning and the reasoning summary could be enabled via the configuration like this one:

Request
{
  "model": "gpt-5-2025-08-07",
  "messages": [
    {
      "role": "user",
      "content": "Write a bash script that takes a matrix represented as a string with format \"[1,2],[3,4],[5,6]\" and prints the transpose in the same format."
    }
  ],
  "custom_fields": {
    "configuration": {
      "reasoning": {
        "effort": "medium",
        "summary": "auto"
      }
    }
  }
}

Here custom_fields.configuration.reasoning is an object which is being passed to the Response API as the reasoning parameter.

Important

Not all models support reasoning. Consult with the documentation before enabling reasoning.

Load balancing

The adapter supports multiple upstream definitions in the DIAL Core config:

{
    "models": {
        "gpt-4o-2024-11-20": {
            "type": "chat",
            "endpoint": "http://$OPENAI_ADAPTER_ORIGIN/openai/deployments/gpt-4o-2024-11-20/chat/completions",
            "displayName": "GPT-4o",
            "upstreams": [
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME1.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                },
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME2.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                },
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME3.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                }
            ]
        }
    }
}

Prompt caching

Prompt caching can be enabled via the autoCachingSupported flag in the DIAL Core config.

{
    "models": {
        "gpt-4o-2024-11-20": {
            "type": "chat",
            "endpoint": "http://$OPENAI_ADAPTER_ORIGIN/openai/deployments/gpt-4o-2024-11-20/chat/completions",
            "displayName": "GPT-4o",
            "upstreams": [
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME1.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                },
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME2.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                },
                {
                    "endpoint": "https://$AZURE_OPENAI_SERVICE_NAME3.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
                }
            ],
            "features": {
                "autoCachingSupported": true
            }
        }
    }
}

Important

Verify that the deployment actually supports prompt caching before enabling it.

Lint

Run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

Run unit tests locally:

make test

Clean

To remove the virtual environment and build artifacts:

make clean

About

The project implements AI DIAL API for language models from Azure OpenAI

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 13

Languages