-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: Tackling Race/State condition issue by Changing the Code Design for Gemini Grounding Sources #7434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Tackling Race/State condition issue by Changing the Code Design for Gemini Grounding Sources #7434
Conversation
- adding new type `ApiStreamGroundingChunk` to the stream type - collecting sources in the `Task.ts` instead -> decoupling
src/core/task/Task.ts
Outdated
if (pendingGroundingSources.length > 0) { | ||
// Remove any grounding source references that might have been integrated into the message | ||
cleanAssistantMessage = assistantMessage | ||
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cleaning the assistant message to remove integrated grounding source references, the regex patterns are nontrivial. Consider adding an inline comment with example inputs to clarify what patterns are being stripped to improve future maintainability.
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") | |
\t\t\t\t\t\t.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") // e.g. "[1] Example Source: https://example.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this well-thought-out solution to the Gemini grounding sources race condition issue! The architectural change to decouple stream chunks from grounding metadata is a solid approach that properly addresses the state persistence problems.
The implementation looks clean and follows good separation of concerns. I've left a few suggestions inline to improve maintainability and robustness, with the most important being the addition of test coverage to prevent regression of this race condition fix.
src/core/task/Task.ts
Outdated
if (pendingGroundingSources.length > 0) { | ||
// Remove any grounding source references that might have been integrated into the message | ||
cleanAssistantMessage = assistantMessage | ||
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the previous suggestion - these regex patterns would benefit from inline comments explaining what they match. Consider adding comments like:
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") | |
cleanAssistantMessage = assistantMessage | |
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") // e.g., "[1] Example Source: https://example.com" | |
.replace(/Sources?:\s*[\s\S]*?(?=\n\n|\n$|$)/g, "") // e.g., "Sources: [1](url1), [2](url2)" |
return { | ||
title, | ||
url: uri, | ||
} | ||
} | ||
return null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding debug logging when sources are filtered out to help with troubleshooting malformed grounding metadata during development. This would make it easier to identify issues with the grounding chunks.
@@ -27,3 +32,14 @@ export interface ApiStreamUsageChunk { | |||
reasoningTokens?: number | |||
totalCost?: number | |||
} | |||
|
|||
export interface ApiStreamGroundingChunk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding JSDoc documentation for the new grounding types to clarify their purpose. For example:
/**
* Represents grounding metadata from search results or citations.
* Used to decouple source information from the main content stream.
*/
export interface ApiStreamGroundingChunk {
type: "grounding"
sources: GroundingSource[]
}
case "grounding": | ||
// Handle grounding sources separately from regular content | ||
// to prevent state persistence issues - store them separately | ||
if (chunk.sources && chunk.sources.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good separation of concerns here! The grounding sources are now properly handled independently from the assistant message content, which should prevent the race condition issues mentioned in #6372.
@@ -2086,9 +2094,30 @@ | |||
let didEndLoop = false | |||
|
|||
if (assistantMessage.length > 0) { | |||
// Display grounding sources to the user if they exist | |||
if (pendingGroundingSources.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation detail - storing grounding sources separately and only displaying them to the user while keeping the API history clean. This prevents the state persistence issues that were causing the race condition.
- Fixed failing test to expect grounding chunk instead of text with citations - Added inline comments to regex patterns for source stripping in Task.ts - Added test coverage for grounding source handling to prevent regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @HahaBill!
Anytime :) |
src/core/task/Task.ts
Outdated
// Display grounding sources to the user if they exist | ||
if (pendingGroundingSources.length > 0) { | ||
const citationLinks = pendingGroundingSources.map((source, i) => `[${i + 1}](${source.url})`) | ||
const sourcesText = `Sources: ${citationLinks.join(", ")}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably internationalize "Sources"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I will add it back.
src/core/task/Task.ts
Outdated
if (pendingGroundingSources.length > 0) { | ||
// Remove any grounding source references that might have been integrated into the message | ||
cleanAssistantMessage = assistantMessage | ||
.replace(/\[\d+\]\s+[^:\n]+:\s+https?:\/\/[^\s\n]+/g, "") // e.g., "[1] Example Source: https://example.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Does it have the potential to remove some content from the message unrelated to grounding? Is this a necessary thing to do in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, those are valid concerns. I personally think that it is unlikely unless users have custom prompts saying that the model should include in the response the grounding-like format content in some parts of the response.
But this might be premature optimization/edge-case handling so we can remove this for now.
Overall, these are answers to your questions:
- There's a small non-zero risk.
- No, it's not necessary. This might be a premature optimization.
Alright, I made some additional changes: |
Related GitHub Issue
Closes: #6372
Description
Before we were collecting sources in from the stream chunks in
api/gemini.ts
. I tried to resolve the issue by finding a culprit responsible for the race condition and state inconsistencies. It turns out that there were leakages from previous message to new messages. I tried very hard to resolve this but then I told myself that maybe it's better to rethink the code design pattern.I came up with this with a help of Roo Code:
Basically, we are decoupling gemini stream chunks and grounding metadata.
Summary:
ApiStreamGroundingChunk
to the stream typeTask.ts
instead -> decouplingTest Procedure
I test this with
gemini-2.5-flash
. For some reason, there are so many errors withgemini-2.5-pro
.gemini-2.5-flash
Grounding with Google Search
Pre-Submission Checklist
Discord username: HaHaBill (belikebill.)
Important
Redesigns code to decouple gemini stream chunks and grounding metadata, preventing state persistence issues and improving clarity.
gemini.ts
.Task.ts
to prevent state persistence issues.ApiStreamGroundingChunk
type instream.ts
.extractGroundingSources()
ingemini.ts
now returnsGroundingSource[]
instead of a string.extractCitationsOnly()
ingemini.ts
usesextractGroundingSources()
.Task
inTask.ts
now handles grounding sources separately and strips them from messages before saving.ApiStreamGroundingChunk
andGroundingSource
tostream.ts
.This description was created by
for a250a7e. You can customize this summary. It will automatically update as commits are pushed.