Skip to content

Commit bac2a64

Browse files
committed
feat(mongodb): add MongoDB LLM cache integrations
- Add `MongoDBCache` and `MongoDBAtlasSemanticCache` classes to `@langchain/mongodb` for key-value and semantic LLM caching. - Implement integration tests for both cache types. - Update documentation: - Add guide for using MongoDB as a key-value cache. - Add guide for using MongoDB Atlas as semantic cache. - Added additional information and clarification on semantic vs. key-value caching. - Fix test issues - DB usage to consistently use `langchain_test` collection for all MongoDB tests. - Use namespace for db naming for all MongoDB tests - Improve test error msg if there is no docker - Always create a search index for a test vector store since the test may fail under some setups otherwise
1 parent e080f18 commit bac2a64

File tree

20 files changed

+590
-33
lines changed

20 files changed

+590
-33
lines changed

β€Ždocs/core_docs/docs/how_to/chat_model_caching.mdxβ€Ž

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,21 @@ LangChain provides an optional caching layer for chat models. This is useful for
1717

1818
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
1919
It can speed up your application by reducing the number of API calls you make to the LLM provider.
20+
There are two methods for caching LLM responses: **semantic caching** and **key-value caching**.
21+
22+
- **Key-value caching** stores responses based on exact query (LLM prompt) matches. When the same request is made again, the cached LLM response is retrieved. Key-value caching is fast, but its shortcoming is that even small changes in the promptβ€”such as punctuation differences or slight wording variations (e.g., _"yes"_ vs. _"yeah"_)β€”can cause a cache miss, leading to a fresh LLM call.
23+
- **[Semantic caching](/docs/integrations/semantic_caching)** improves upon key-value caching by relying on the meaning of the prompt rather than exact matches. If a new LLM prompt is semantically similar to a previously cached one, the stored response is retrieved and reused, reducing LLM usage costs. A typical implementation of semantic caching involves storing prompts as embeddings and using similarity search to identify a cache hit.
24+
25+
This page goes over key-value caching. To use semantic caching, see [LLM Semantic Caching](/docs/integrations/semantic_caching).
26+
27+
NOTE: The caching integrations in LangChain do not include expiring old or unused values. Based on your use and application, you should decide if and what kind of eviction process you need to implement and implement it directly with the storage.
2028

2129
import CodeBlock from "@theme/CodeBlock";
2230

2331
```typescript
2432
import { ChatOpenAI } from "@langchain/openai";
2533

26-
// To make the caching really obvious, lets use a slower model.
34+
// To make the caching really obvious, let's use a slower model.
2735
const model = new ChatOpenAI({
2836
model: "gpt-4",
2937
cache: true,
@@ -122,13 +130,13 @@ import AdvancedUpstashRedisCacheExample from "@examples/cache/chat_models/upstas
122130

123131
## Caching with Vercel KV
124132

125-
LangChain provides an Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the `@vercel/kv` package:
133+
LangChain provides a Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the `@vercel/kv` package:
126134

127135
```bash npm2yarn
128136
npm install @vercel/kv
129137
```
130138

131-
You'll also need an Vercel account and a [KV database](https://vercel.com/docs/storage/vercel-kv/kv-reference) to connect to. Once you've done that, retrieve your REST URL and REST token.
139+
You'll also need a Vercel account and a [KV database](https://vercel.com/docs/storage/vercel-kv/kv-reference) to connect to. Once you've done that, retrieve your REST URL and REST token.
132140

133141
Then, you can pass a `cache` option when you instantiate the LLM. For example:
134142

@@ -156,14 +164,39 @@ import CloudflareExample from "@examples/cache/chat_models/cloudflare_kv.ts";
156164

157165
<CodeBlock language="typescript">{CloudflareExample}</CodeBlock>
158166

167+
## Caching with MongoDB
168+
169+
NOTE: This section is for using MongoDB as a key-value LLM caching. For **semantic caching** see [MongoDB Atlas Semantic Cache](/docs/integrations/semantic_caching/mongodb_atlas).
170+
171+
LangChain provides MongoDB-based cache support. This is especially useful if your application is already using MongoDB as a database, and you don't want to add another data store integration.
172+
173+
To use this cache, you'll need to install the `mongodb` as well as `@langchain/mongodb`:
174+
175+
```bash npm2yarn
176+
npm install mongodb @langchain/mongodb @langchain/core
177+
```
178+
179+
The MongoDB cache integration does not create a collection for your cache storage.
180+
Assuming you have already set up your collection, you can utilize it for caching as follows:
181+
182+
import MongoDBCacheExample from "@examples/cache/chat_models/mongodb.ts";
183+
184+
<CodeBlock language="typescript">{MongoDBCacheExample}</CodeBlock>
185+
186+
Hint: The key-value cache is stored in the collection using `prompt` as the key, and `llm` as the value. You can speed up fetching cached entries by setting up an index (not a Vector Search index) on prompt.
187+
188+
```
189+
await db.collection(LLM_CACHE).createIndex({ prompt: 1 })
190+
```
191+
159192
## Caching on the File System
160193

161194
:::warning
162195
This cache is not recommended for production use. It is only intended for local development.
163196
:::
164197

165198
LangChain provides a simple file system cache.
166-
By default the cache is stored a temporary directory, but you can specify a custom directory if you want.
199+
By default, the cache is stored in a temporary directory, but you can specify a custom directory if you want.
167200

168201
```typescript
169202
const cache = await LocalFileCache.create();

β€Ždocs/core_docs/docs/how_to/llm_caching.mdxβ€Ž

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ sidebar_position: 2
44

55
# How to cache model responses
66

7+
NOTE: This section is for older language models that take a string as input and return a string as output. Users should be using almost exclusively the newer Chat Models as most model providers have adopted a chat like interface for interacting with language models. See [Chat Model Caching](/docs/how_to/chat_model_caching) for implementing caching, including Semantic Caching with the new chat models.
8+
79
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:
810

911
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
@@ -197,6 +199,31 @@ import CloudflareExample from "@examples/cache/cloudflare_kv.ts";
197199

198200
<CodeBlock language="typescript">{CloudflareExample}</CodeBlock>
199201

202+
## Caching with MongoDB
203+
204+
NOTE: This section is for using MongoDB as a key-value LLM caching. For **semantic caching** see [MongoDB Atlas Semantic Cache](/docs/integrations/semantic_caching/mongodb_atlas).
205+
206+
LangChain provides MongoDB-based cache support. This is especially useful if your application is already using MongoDB as a database, and you don't want to add another data store integration.
207+
208+
To use this cache, you'll need to install the `mongodb` as well as `@langchain/mongodb`:
209+
210+
```bash npm2yarn
211+
npm install mongodb @langchain/mongodb @langchain/core
212+
```
213+
214+
The MongoDB cache integration does not create a collection for your cache storage.
215+
Assuming you have already set up your collection, you can utilize it for caching as follows:
216+
217+
import MongoDBCacheExample from "@examples/cache/chat_models/mongodb.ts";
218+
219+
<CodeBlock language="typescript">{MongoDBCacheExample}</CodeBlock>
220+
221+
Hint: The key-value cache is stored in the collection using `prompt` as the key, and `llm` as the value. You can speed up fetching cached entries by setting up an index (not a Vector Search index) on prompt.
222+
223+
```
224+
await db.collection(LLM_CACHE).createIndex({ prompt: 1 })
225+
```
226+
200227
## Caching on the File System
201228

202229
:::warning

β€Ždocs/core_docs/docs/integrations/llm_caching/index.mdxβ€Ž

Lines changed: 0 additions & 14 deletions
This file was deleted.

β€Ždocs/core_docs/docs/integrations/platforms/microsoft.mdxβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ import { AzureCosmosDBMongoDBVectorStore } from "@langchain/azure-cosmosdb";
144144
npm install @langchain/azure-cosmosdb @langchain/core
145145
```
146146

147-
See a [usage example](/docs/integrations/llm_caching/azure_cosmosdb_nosql).
147+
See a [usage example](/docs/integrations/semantic_caching/azure_cosmosdb_nosql).
148148

149149
```typescript
150150
import { AzureCosmosDBNoSQLSemanticCache } from "@langchain/azure-cosmosdb";
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ If you create the container beforehand, make sure to set the partition key to `/
4949

5050
## Usage example
5151

52-
import Example from "@examples/caches/azure_cosmosdb_nosql/azure_cosmosdb_nosql.ts";
52+
import Example from "@examples/cache/semantic_cache/azure_cosmosdb_nosql.ts";
5353

5454
<CodeBlock language="typescript">{Example}</CodeBlock>
5555

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
sidebar_class_name: hidden
3+
hide_table_of_contents: true
4+
---
5+
6+
# LLM Semantic Caching
7+
8+
[Caching LLM calls](/docs/how_to/chat_model_caching) can be useful for testing, cost savings, and speed.
9+
10+
## Caching LLM Responses: Semantic vs. Key-Value Caching
11+
12+
Currently, there are two methods for caching LLM responses: **semantic caching** and **key-value caching**.
13+
14+
_Key-value LLM caching guide:_
15+
See [LLM Cache how-to guide](/docs/how_to/llm_caching)
16+
17+
### Key-Value Caching
18+
19+
Key-value caching stores responses based on exact query (LLM prompt) matches. When the same request is made again, the cached LLM response is retrieved. Key-value caching is fast, but its shortcoming is that even small changes in the promptβ€”such as punctuation differences or slight wording variations (e.g., _"yes"_ vs. _"yeah"_)β€”can cause a cache miss, leading to a fresh LLM call.
20+
21+
### Semantic Caching
22+
23+
Semantic caching improves upon this by relying on the meaning of the prompt rather than exact matches. If a new LLM prompt is semantically similar to a previously cached one, the stored response is retrieved and reused, reducing LLM usage costs. A typical implementation of semantic caching involves storing prompts as embeddings and using similarity search to identify a cache hit.
24+
25+
_Semantic Caching Guides:_
26+
27+
import { IndexTable } from "@theme/FeatureTables";
28+
29+
<IndexTable />
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# MongoDB Atlas Semantic Cache
2+
3+
This page documents the MongoDB Atlas integration for **semantic caching** of LLM generation outputs. See [MongoDB Atlas](docs/integrations/vectorstores/mongodb_atlas) for additional setup and configuration information.
4+
5+
Semantic caching allows you to cache and retrieve generations based on vector similarity, so that similar prompts can share cached results.
6+
7+
## Install dependencies
8+
9+
You'll first need to install the [`@langchain/mongodb`](https://www.npmjs.com/package/@langchain/mongodb) as well as [`mongodb`](https://www.npmjs.com/package/mongodb):
10+
11+
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
12+
13+
<IntegrationInstallTooltip></IntegrationInstallTooltip>
14+
15+
```bash npm2yarn
16+
npm install mongodb @langchain/mongodb @langchain/core
17+
```
18+
19+
## Set up the cache collection
20+
21+
You will need the `mongodb` driver package to manage your database, collection(s), and vector search indexes. The `@langchain/mongodb` package provides the integration for LangChain and expects a ready-to-use collection and vector search index.
22+
23+
You can set up a vector collection either through the MongoDB Atlas UI or with commands such as the following:
24+
25+
```typescript
26+
import { MongoClient } from "mongodb";
27+
28+
const client = new MongoClient(process.env.MONGODB_ATLAS_URI);
29+
await client.connect();
30+
const db = client.db("db_name");
31+
const collection = db.collection("collection_name");
32+
33+
// Create a search index. The dimensions must match your embedding dimensions.
34+
await collection.createSearchIndex({
35+
name: "default",
36+
definition: {
37+
mappings: {
38+
dynamic: true,
39+
fields: {
40+
embedding: {
41+
dimensions: 1024,
42+
similarity: "cosine",
43+
type: "knnVector",
44+
},
45+
},
46+
},
47+
},
48+
});
49+
```
50+
51+
Note that the initial creation of a vector search index takes some time (it may take more than 30 seconds). If you query the vector index while it is initializing, you may receive an error or an empty response. Also, each time a new document (vector embedding, etc.) is added, the index needs to update before it can return the new document as part of its response. You can query the vector index while it is being updated, but it will return data based on the old index.
52+
53+
## LLM call with semantic caching
54+
55+
import MongoDBAtlasSemanticCacheExample from "@examples/cache/semantic_cache/mongodb_atlas.ts";
56+
57+
import CodeBlock from "@theme/CodeBlock";
58+
59+
<CodeBlock language="typescript">{MongoDBAtlasSemanticCacheExample}</CodeBlock>

β€Ždocs/core_docs/sidebars.jsβ€Ž

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -370,13 +370,13 @@ module.exports = {
370370
items: [
371371
{
372372
type: "autogenerated",
373-
dirName: "integrations/llm_caching",
373+
dirName: "integrations/semantic_caching",
374374
className: "hidden",
375375
},
376376
],
377377
link: {
378378
type: "doc",
379-
id: "integrations/llm_caching/index",
379+
id: "integrations/semantic_caching/index",
380380
},
381381
},
382382
{
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import { MongoClient } from "mongodb";
2+
import { MongoDBCache } from "@langchain/mongodb";
3+
import { OpenAI } from "@langchain/openai";
4+
5+
let client;
6+
if (process.env.MONGODB_ATLAS_URI) {
7+
client = new MongoClient(process.env.MONGODB_ATLAS_URI);
8+
} else {
9+
client = new MongoClient("mongodb://localhost:27017");
10+
}
11+
12+
await client.connect();
13+
const collection = client.db("langchain").collection("llm_cache");
14+
15+
const cache = new MongoDBCache({ collection });
16+
17+
const model = new OpenAI({ cache });
18+
19+
const response1 = await model.invoke("Tell me a joke!");
20+
console.log(response1);
21+
22+
const response2 = await model.invoke("Tell me a joke!");
23+
console.log(response2);
24+
25+
// Hint: You can speed up fetching cached entries by setting up an index on prompt:
26+
// await collection.createIndex({ prompt: 1 });
File renamed without changes.

0 commit comments

Comments
Β (0)