-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Condense Context] Track metrics around context condensing and show in UI #3736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 7df703f The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
// Check if there's a recent summary in the messages we're keeping | ||
const recentSummaryExists = keepMessages.some((message) => message.isSummary) | ||
if (recentSummaryExists) { | ||
return response // We recently summarized these messages; it's too soon to summarize again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @canrobins13,
Just a question about the truncateConversationIfNeeded
function in sliding-window
:
When autoCondenseContext
is true and the token limit is hit, it tries to summarize. But if summarizeConversation
decides not to create a summary (maybe because there aren't many new messages or there was a recent summary), it looks like truncateConversationIfNeeded
still goes ahead and does the simple truncation.
I am aware this behavior was present before your changes, but would it make more sense to use the previous summary considering it might still be fresh rather than using the simple truncation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When making an API request we only send messages starting from the latest summary if one exists, so the recent summary message would be "used" in the case you highlighted (when making API requests and when calculating the context window usage).
This case should be pretty hard to hit through the existing implementation, but will be more relevant because we're about to add a button for manually triggering context summarization through the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, if the condense feature becomes manual and the user clicks summarize multiple times, would this trigger the simple truncation? Or is it only triggered when the context is already bigger than the window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not implemented yet, but we wouldn't trigger simple truncation or dropping of messages from the user button, that would only hit the new condense code
<div className="mt-2 ml-6 p-3 bg-vscode-editor-background rounded text-vscode-foreground text-sm"> | ||
<h3 className="font-bold mb-2">{t("chat:contextCondense.conversationSummary")}</h3> | ||
{summary} | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey again,
Would rendering the summary with webview-ui/src/components/chat/Markdown.tsx
create too much visual clutter?
It might make the summary easier to read specially since the models love to output markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try it that out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, updated the code :)
The only downside it that things like tags are omitted... it's a better UI but maybe slightly worse for analysis/debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, are you talking about the tags? Does the summary prompt encourage the model to use other XML tags?
If that's the case it would be interesting to see if we can improve the Markdown.tsx
component to show the tags, of course in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't encourage it to use any other tags
Great work! Now we won't have so many people asking about when this feature is triggered and what it does. I'm honestly really excited to be able to trigger this behavior manually as well. |
// If the previous API request's total token usage is close to the | ||
// context window, truncate the conversation history to free up space | ||
// for the new request. | ||
if (previousApiReqIndex >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything different about this previousApiReqIndex
version and the new code (which seems simpler)? It seems like they are different but it's hard for me to tell if the new version is also correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question but I couldn't find any reason that it wasn't using getApiMetrics
before... agree it feels weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me; left a few nits (feel free to ignore) and one question.
d2bdd98
to
a1a8654
Compare
* add protos files (RooCodeInc#3736) Co-authored-by: Elephant Lumps <[email protected]> * change package script --------- Co-authored-by: Elephant Lumps <[email protected]>
* add protos files (RooCodeInc#3736) Co-authored-by: Elephant Lumps <[email protected]> * change package script * rest of updates for vertex / bedrock --------- Co-authored-by: Elephant Lumps <[email protected]>
Related GitHub Issue
Closes: #3673
Description
This PR has a few visible changes:
Point #2 above will be necessary for unlocking #3671. Without the changes in this PR, the context progress bar would not update after a manual condense operation until the next API request is made, which would be confusing to the user. After this PR, the context progress bar is correct after a condense operation even without an additional API request.
Test Procedure
Type of Change
src
or test files.Pre-Submission Checklist
npm run lint
).console.log
) has been removed.npm test
).main
branch.npm run changeset
if this PR includes user-facing changes or dependency updates.Screenshots / Videos
Collapsed view
Expanded view
Important
This PR adds UI and backend support for tracking and displaying context condensing metrics, ensuring synchronization with context operations.
ChatRow.tsx
to display context condensing metrics.summarizeConversation()
inindex.ts
to return aSummarizeResponse
with context metrics.truncateConversationIfNeeded()
insliding-window/index.ts
to handle context condensing.contextCondenseSchema
inschemas/index.ts
for context condensing data.index.test.ts
andsliding-window.test.ts
to verify context condensing behavior.getApiMetrics()
ingetApiMetrics.ts
to include context condensing metrics.This description was created by
for 7df703f. You can customize this summary. It will automatically update as commits are pushed.