You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add `MongoDBCache` and `MongoDBAtlasSemanticCache` classes to `@langchain/mongodb` for key-value and semantic LLM caching.
- Implement integration tests for both cache types.
- Update documentation:
- Add guide for using MongoDB as a key-value cache.
- Add guide for using MongoDB Atlas as semantic cache.
- Added additional information and clarification on semantic vs. key-value caching.
- Fix test issues
- DB usage to consistently use `langchain_test` collection for all MongoDB tests.
- Use namespace for db naming for all MongoDB tests
- Improve test error msg if there is no docker
- Always create a search index for a test vector store since the test may fail under some setups otherwise
Copy file name to clipboardExpand all lines: docs/core_docs/docs/how_to/chat_model_caching.mdx
+37-4Lines changed: 37 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,13 +17,21 @@ LangChain provides an optional caching layer for chat models. This is useful for
17
17
18
18
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
19
19
It can speed up your application by reducing the number of API calls you make to the LLM provider.
20
+
There are two methods for caching LLM responses: **semantic caching** and **key-value caching**.
21
+
22
+
-**Key-value caching** stores responses based on exact query (LLM prompt) matches. When the same request is made again, the cached LLM response is retrieved. Key-value caching is fast, but its shortcoming is that even small changes in the promptβsuch as punctuation differences or slight wording variations (e.g., _"yes"_ vs. _"yeah"_)βcan cause a cache miss, leading to a fresh LLM call.
23
+
-**[Semantic caching](/docs/integrations/semantic_caching)** improves upon key-value caching by relying on the meaning of the prompt rather than exact matches. If a new LLM prompt is semantically similar to a previously cached one, the stored response is retrieved and reused, reducing LLM usage costs. A typical implementation of semantic caching involves storing prompts as embeddings and using similarity search to identify a cache hit.
24
+
25
+
This page goes over key-value caching. To use semantic caching, see [LLM Semantic Caching](/docs/integrations/semantic_caching).
26
+
27
+
NOTE: The caching integrations in LangChain do not include expiring old or unused values. Based on your use and application, you should decide if and what kind of eviction process you need to implement and implement it directly with the storage.
20
28
21
29
importCodeBlockfrom"@theme/CodeBlock";
22
30
23
31
```typescript
24
32
import { ChatOpenAI } from"@langchain/openai";
25
33
26
-
// To make the caching really obvious, lets use a slower model.
34
+
// To make the caching really obvious, let's use a slower model.
27
35
const model =newChatOpenAI({
28
36
model: "gpt-4",
29
37
cache: true,
@@ -122,13 +130,13 @@ import AdvancedUpstashRedisCacheExample from "@examples/cache/chat_models/upstas
122
130
123
131
## Caching with Vercel KV
124
132
125
-
LangChain provides an Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the `@vercel/kv` package:
133
+
LangChain provides a Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the `@vercel/kv` package:
126
134
127
135
```bash npm2yarn
128
136
npm install @vercel/kv
129
137
```
130
138
131
-
You'll also need an Vercel account and a [KV database](https://vercel.com/docs/storage/vercel-kv/kv-reference) to connect to. Once you've done that, retrieve your REST URL and REST token.
139
+
You'll also need a Vercel account and a [KV database](https://vercel.com/docs/storage/vercel-kv/kv-reference) to connect to. Once you've done that, retrieve your REST URL and REST token.
132
140
133
141
Then, you can pass a `cache` option when you instantiate the LLM. For example:
134
142
@@ -156,14 +164,39 @@ import CloudflareExample from "@examples/cache/chat_models/cloudflare_kv.ts";
NOTE: This section is for using MongoDB as a key-value LLM caching. For **semantic caching** see [MongoDB Atlas Semantic Cache](/docs/integrations/semantic_caching/mongodb_atlas).
170
+
171
+
LangChain provides MongoDB-based cache support. This is especially useful if your application is already using MongoDB as a database, and you don't want to add another data store integration.
172
+
173
+
To use this cache, you'll need to install the `mongodb` as well as `@langchain/mongodb`:
Hint: The key-value cache is stored in the collection using `prompt` as the key, and `llm` as the value. You can speed up fetching cached entries by setting up an index (not a Vector Search index) on prompt.
Copy file name to clipboardExpand all lines: docs/core_docs/docs/how_to/llm_caching.mdx
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,8 @@ sidebar_position: 2
4
4
5
5
# How to cache model responses
6
6
7
+
NOTE: This section is for older language models that take a string as input and return a string as output. Users should be using almost exclusively the newer Chat Models as most model providers have adopted a chat like interface for interacting with language models. See [Chat Model Caching](/docs/how_to/chat_model_caching) for implementing caching, including Semantic Caching with the new chat models.
8
+
7
9
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:
8
10
9
11
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
@@ -197,6 +199,31 @@ import CloudflareExample from "@examples/cache/cloudflare_kv.ts";
NOTE: This section is for using MongoDB as a key-value LLM caching. For **semantic caching** see [MongoDB Atlas Semantic Cache](/docs/integrations/semantic_caching/mongodb_atlas).
205
+
206
+
LangChain provides MongoDB-based cache support. This is especially useful if your application is already using MongoDB as a database, and you don't want to add another data store integration.
207
+
208
+
To use this cache, you'll need to install the `mongodb` as well as `@langchain/mongodb`:
Hint: The key-value cache is stored in the collection using `prompt` as the key, and `llm` as the value. You can speed up fetching cached entries by setting up an index (not a Vector Search index) on prompt.
[Caching LLM calls](/docs/how_to/chat_model_caching) can be useful for testing, cost savings, and speed.
9
+
10
+
## Caching LLM Responses: Semantic vs. Key-Value Caching
11
+
12
+
Currently, there are two methods for caching LLM responses: **semantic caching** and **key-value caching**.
13
+
14
+
_Key-value LLM caching guide:_
15
+
See [LLM Cache how-to guide](/docs/how_to/llm_caching)
16
+
17
+
### Key-Value Caching
18
+
19
+
Key-value caching stores responses based on exact query (LLM prompt) matches. When the same request is made again, the cached LLM response is retrieved. Key-value caching is fast, but its shortcoming is that even small changes in the promptβsuch as punctuation differences or slight wording variations (e.g., _"yes"_ vs. _"yeah"_)βcan cause a cache miss, leading to a fresh LLM call.
20
+
21
+
### Semantic Caching
22
+
23
+
Semantic caching improves upon this by relying on the meaning of the prompt rather than exact matches. If a new LLM prompt is semantically similar to a previously cached one, the stored response is retrieved and reused, reducing LLM usage costs. A typical implementation of semantic caching involves storing prompts as embeddings and using similarity search to identify a cache hit.
This page documents the MongoDB Atlas integration for **semantic caching** of LLM generation outputs. See [MongoDB Atlas](docs/integrations/vectorstores/mongodb_atlas) for additional setup and configuration information.
4
+
5
+
Semantic caching allows you to cache and retrieve generations based on vector similarity, so that similar prompts can share cached results.
6
+
7
+
## Install dependencies
8
+
9
+
You'll first need to install the [`@langchain/mongodb`](https://www.npmjs.com/package/@langchain/mongodb) as well as [`mongodb`](https://www.npmjs.com/package/mongodb):
You will need the `mongodb` driver package to manage your database, collection(s), and vector search indexes. The `@langchain/mongodb` package provides the integration for LangChain and expects a ready-to-use collection and vector search index.
22
+
23
+
You can set up a vector collection either through the MongoDB Atlas UI or with commands such as the following:
// Create a search index. The dimensions must match your embedding dimensions.
34
+
awaitcollection.createSearchIndex({
35
+
name: "default",
36
+
definition: {
37
+
mappings: {
38
+
dynamic: true,
39
+
fields: {
40
+
embedding: {
41
+
dimensions: 1024,
42
+
similarity: "cosine",
43
+
type: "knnVector",
44
+
},
45
+
},
46
+
},
47
+
},
48
+
});
49
+
```
50
+
51
+
Note that the initial creation of a vector search index takes some time (it may take more than 30 seconds). If you query the vector index while it is initializing, you may receive an error or an empty response. Also, each time a new document (vector embedding, etc.) is added, the index needs to update before it can return the new document as part of its response. You can query the vector index while it is being updated, but it will return data based on the old index.
0 commit comments