Skip to content

Conversation

rainsun
Copy link
Contributor

@rainsun rainsun commented Aug 13, 2025

feat(registry): implement transport caching and clear cache on registry updates
- Added transport caching to optimize HTTP transport reuse for registry endpoints.
- Implemented ClearTransportCache function to clear the transport cache when registry configurations change, ensuring fresh transports are used.
- Enhanced GetTransport method to utilize cached transports, improving performance and reducing overhead.
- fixes: #416

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 95.83333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.79%. Comparing base (a7ee15a) to head (847f803).

Files with missing lines Patch % Lines
registry-scanner/pkg/registry/endpoints.go 95.45% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1215      +/-   ##
==========================================
+ Coverage   70.58%   70.79%   +0.21%     
==========================================
  Files          45       45              
  Lines        5133     5178      +45     
==========================================
+ Hits         3623     3666      +43     
- Misses       1344     1346       +2     
  Partials      166      166              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chengfang
Copy link
Collaborator

@rainsun thanks for the contribution! Have you considered using github.com/patrickmn/go-cache, which is the one currently used for managing in-memory caching? See https://github.com/argoproj-labs/argocd-image-updater/blob/master/registry-scanner/pkg/cache/memcache.go

How do you handle cache entry expiration and evication to avoid the cache growing indefinitely? What if the transport returned from the cache is already timed out and not usable?

@eidmantas
Copy link

eidmantas commented Sep 16, 2025

Thanks for the transport reuse/timeouts — big win under load.

Do - JWT/token requests (e.g. /jwt/auth) use the same cached http.Transport and timeouts? Our timeouts are mostly on the auth endpoint, not the registry.
If not, could token fetches use the same transport, and could we expose a couple knobs (ResponseHeaderTimeout, MaxConnsPerHost) for tuning?
Example under load (multiple registries/repos, burst of tag-list calls):

time="2025-09-16T10:00:58Z" level=error msg="Could not get tags from registry: Get \"http://<registry>/v2/<repo>/tags/list\": Get \"https://<gitlab-host>/jwt/auth?...\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

and also - perhaps retry could be introduced to the transport?

@rainsun rainsun force-pushed the reuse-transport branch 2 times, most recently from d48b361 to 26f53a9 Compare September 22, 2025 15:15
@rainsun
Copy link
Contributor Author

rainsun commented Sep 22, 2025

@chengfang Thanks for the insightful review and great suggestions! You raised some very valid points about the initial implementation.

I've updated the code to address your feedback:

  1. Adopted go-cache: I've refactored the transport cache to use github.com/patrickmn/go-cache, aligning it with the project's existing caching strategy as you suggested.
  2. Cache Expiration and Eviction: The cache is now initialized with a 30-minute default expiration and a 10-minute cleanup interval. This prevents the cache from growing
    indefinitely and ensures stale entries are automatically purged.
  3. Stale Transport Handling: To address the concern about timed-out or unusable transports, I've added a validation function that checks if a cached transport is still valid
    before it's returned. If the check fails, the stale entry is evicted, and a new transport will be created for the request.

I also wanted to add a quick note on why I created a new, separate cache here instead of trying to reuse the one from pkg/cache/memcache.go.

I figured it was better to keep them separate since they are caching two very different things (http.Transport vs image tags). They also have different requirements, especially
around auto-expiration. Mixing them would have made the existing memcache.go file more complex. This way, they are nicely independent, and we can easily tweak the transport cache
in the future without worrying about breaking the tag cache.

Please let me know if this line of reasoning makes sense to you. Thanks again for your help

@chengfang
Copy link
Collaborator

Do - JWT/token requests (e.g. /jwt/auth) use the same cached http.Transport and timeouts? Our timeouts are mostly on the auth endpoint, not the registry. If not, could token fetches use the same transport, and could we expose a couple knobs (ResponseHeaderTimeout, MaxConnsPerHost) for tuning? Example under load (multiple registries/repos, burst of tag-list calls):

From https://github.com/argoproj-labs/argocd-image-updater/blob/master/registry-scanner/pkg/registry/client.go#L113-L117 it is using the same endpoint.GetTransport() func. With the PR fix, it should also help in your case.

Exposing certain options would require more thinking and I think would be beyond the scope of this pr.

and also - perhaps retry could be introduced to the transport?

http.Transport already handles retries, with certain restriction (from docs in http.Transport):
// Transport only retries a request upon encountering a network error
// if the connection has been already been used successfully and if the
// request is idempotent and either has no body or has its [Request.GetBody]
// defined. HTTP requests are considered idempotent if they have HTTP methods
// GET, HEAD, OPTIONS, or TRACE; or if their [Header] map contains an
// "Idempotency-Key" or "X-Idempotency-Key" entry. If the idempotency key
// value is a zero-length slice, the request is treated as idempotent but the
// header is not sent on the wire.

@chengfang
Copy link
Collaborator

Thanks @rainsun for updating the PR. It looks good overall and I added a comment about reduced timeout values. Can you please take a look? Also can you also add unit tests to cover the new functionalities?

@rainsun
Copy link
Contributor Author

rainsun commented Sep 24, 2025

Thanks @chengfang for the review! I've reverted the timeout values to their defaults as you suggested.

also added unit tests to cover the new transport caching functionality:

  • TestTransportCache: Verifies that the transport is correctly cached on miss, retrieved on hit, and that the cache is cleared properly.
  • TestIsTransportValid: Ensures that the transport validation logic correctly identifies both valid and invalid transports.
  • I also updated TestGetTransport to ensure tests don't interfere with each other by using unique registry APIs and clearing the cache between runs.

All tests are passing. Happy to hear any other thoughts you might have.

…ry updates

- Added transport caching to optimize HTTP transport reuse for registry endpoints.
- Implemented ClearTransportCache function to clear the transport cache when registry configurations change, ensuring fresh transports are used.
- Enhanced GetTransport method to utilize cached transports, improving performance and reducing overhead.

Signed-off-by: rainsun <[email protected]>
@chengfang chengfang merged commit 8c9172e into argoproj-labs:master Sep 24, 2025
11 checks passed
@rainsun rainsun deleted the reuse-transport branch September 25, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error with "cannot assign requested address" After Running for Some Time
4 participants