-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Open
Labels
area/apiarea/backendsenhancementNew feature or requestNew feature or requestroadmapup for grabsTickets that no-one is currently working onTickets that no-one is currently working on
Description
Is your feature request related to a problem? Please describe.
For generative models, many are limited by a maximum number of tokens. in some workflows, the prompts are generated dynamically to use as much context as possible by tokenizing the responses first to ensure that they will fit in the context.
Currently, this requires a local tokenization scheme which limits a complete API workflow.
Describe the solution you'd like
backends like transformers and llama.cpp both offer tokenization methods that just tokenize text without generating
a response. Attaching these methods to a tokenization api endpoint would be helpful in removing local processing requirements.
Describe alternatives you've considered
Additional context
JackBekket
Metadata
Metadata
Assignees
Labels
area/apiarea/backendsenhancementNew feature or requestNew feature or requestroadmapup for grabsTickets that no-one is currently working onTickets that no-one is currently working on