-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[cache] Fix descriptor duplication in getAvailableBlobs #6092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cache] Fix descriptor duplication in getAvailableBlobs #6092
Conversation
92d0f5a
to
34ade39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Please squash the commits.
30b33bc
to
0858b9e
Compare
@ktock done, thanks for the review 🙇 |
@ktock any idea why the tests would be failing? I'm don't really understand the errors + it was not failing before I sqashed 🤔 |
TestGetRemotes failure doesn't look like caused by this change (posted a possible fix #6099). I'm rerunning the test. https://github.com/moby/buildkit/actions/runs/16344069969/job/46175402958?pr=6092#step:8:4328
|
Another failure but it also looks unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I look at the implementation of walkBlob
it is calling walkBlobVariantsOnly
with visited
map to avoid duplications. Isn't that one working correctly, and if it is not then probably should fix it there.
@tonistiigi it doesn't work as expected because:
I thought about fixing by moving the |
@tonistiigi do you mind giving me your updated opinion based on my message above? |
cache/remote.go
Outdated
@@ -112,9 +112,11 @@ func getAvailableBlobs(ctx context.Context, cs content.Store, chain *solver.Remo | |||
if err != nil { | |||
return nil, err | |||
} | |||
var descs []ocispecs.Descriptor | |||
var descs = map[digest.Digest]ocispecs.Descriptor{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current patch seems to make output order potentially undeterministic. Could this be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonistiigi that's a good point. I tried to track all the usages of descs
in the calling functions but it branches out fast and I can't guarantee that nondeterministic ordering won't ever be an issue.
I've modified the code to revert to using a slice and do a basic for
loop on it to ensure deduplication (which is think is fine performance wise, see updated comment in 622757e)
What do you think?
332db0f
to
622757e
Compare
cache/remote.go
Outdated
// Looping over the list is preferable: | ||
// 1. to avoid using a map, which don't preserve the order of descriptors, | ||
// 2. descs will have a length the number of compression variants for a blob, which is usually very small | ||
for _, d := range descs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done with https://pkg.go.dev/slices#ContainsFunc in a slightly cleaner way in the new Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 7526d59
594e9ce
to
7526d59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash commits as well
The walkBlob function was called with a function having a side effect: appending a descriptor to a list. However, the function could be called multiple times for the same descriptor (once in `walkBlob` and once in `walkBlobVariantsOnly` in that case) leading to duplicate entries in the final list. Instead, we will check each time we want to add an element, that it's not already in the list using slice.ContainsFunc. This should be fine from a performance perspective as the list is small (at most 2 compressions in the vast majority of cases). Signed-off-by: Baptiste Girard-Carrabin <[email protected]>
7526d59
to
786d74c
Compare
commits squashed @tonistiigi |
tests seem to have irrelevant failures again 😓 can they be retried? |
The walkBlob function was called with a function having a side effect: appending a descriptor to a list. However, the function could be called multiple times for the same descriptor (once in
walkBlob
and once inwalkBlobVariantsOnly
in that case) leading to duplicate entries in the final list. Instead of the list, we can use a map to store the descriptors. With the digest as key, it will ensure we don't store duplicates.This fixes one of the problem encountered in #6088
cc @ktock