Skip to content

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Jul 25, 2025

What does this PR do?

It makes it possible to avoid re-downloading binaries on each mage package invocation. It does this by:

  • Introducing a KEEP_ARCHIVE environment variable, which, when set, prevents deletion of the binary archive in the drop path: 0e63536
  • Making the binary download code send a If-Modified-Since header based on the modification time of the existing file. If this returns a Not Modified code, we skip the download: a7ec517

The environment variable is introduced to avoid potentially breaking the CI and unified builds. We need to more carefully verify that it won't before enabling it by default.

Why is it important?

Redownloading binaries wastes time and makes agent difficult to work with on slow internet connections.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

How to test this PR locally

Run EXTERNAL=true SNAPSHOT=true DEV=true KEEP_ARCHIVE=true mage package twice. The second time should be faster.

Related issues

@swiatekm swiatekm added skip-changelog chore Tasks that just need to be done, they are neither bug, nor enhancements backport-active-all Automated backport with mergify to all the active branches labels Jul 25, 2025
@swiatekm swiatekm marked this pull request as ready for review July 26, 2025 00:00
@swiatekm swiatekm requested a review from a team as a code owner July 26, 2025 00:00
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jul 27, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm force-pushed the chore/cache-external-binaries branch from be4df5e to fecc75c Compare July 28, 2025 12:56
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look at the code and the endgoal makes total sense to me and it is welcome improvement. There is only this part that we rely on the server telling us if something has changed or not (by using "If-Modified-Since" http header) that I am not sure how I feel about it. By leveraging the latter we kinda "expose" our packaging to any bugs of the artifacts server (has caused us issues in the past). Maybe a more controlled by us solution would be to download first the sha512 file compare it with what we have locally and if different download. wdyt?

@swiatekm
Copy link
Contributor Author

I did look at the code and the endgoal makes total sense to me and it is welcome improvement. There is only this part that we rely on the server telling us if something has changed or not (by using "If-Modified-Since" http header) that I am not sure how I feel about it. By leveraging the latter we kinda "expose" our packaging to any bugs of the artifacts server (has caused us issues in the past). Maybe a more controlled by us solution would be to download first the sha512 file compare it with what we have locally and if different download. wdyt?

For what it's worth, I think our artifacts API just passes that header to the underlying object storage, which should have a correct implementation. I originally thought we always check the downloaded file against the hash, but it seems like we might not? If we did, that would solve the problem, because even if the artifacts api was wrong, we'd get a hash mismatch and at least see the problem.

For the record, I think actual packaging steps in the CI should always redownload the artifacts, ideally by calling mage clean.

@pkoutsovasilis
Copy link
Contributor

For what it's worth, I think our artifacts API just passes that header to the underlying object storage, which should have a correct implementation. I originally thought we always check the downloaded file against the hash, but it seems like we might not?

and for this exact reason of having to speculate when we rely on an external server, I would prefer utilising the sha512 files. (PS: if what you say is true and the underlying storage relies on the modified time of the file, is not the most reliable thing out there). Also, as you do mention above, you would solve another problem we currently have of not validating the components we download by hash, win win scenario 🙂

@swiatekm
Copy link
Contributor Author

swiatekm commented Jul 31, 2025

For what it's worth, I think our artifacts API just passes that header to the underlying object storage, which should have a correct implementation. I originally thought we always check the downloaded file against the hash, but it seems like we might not?

and for this exact reason of having to speculate when we rely on an external server, I would prefer utilising the sha512 files. (PS: if what you say is true and the underlying storage relies on the modified time of the file, is not the most reliable thing out there). Also, as you do mention above, you would solve another problem we currently have of not validating the components we download by hash, win win scenario 🙂

I would consider If-Changed-Since headers on say, S3 or GCS, to be reliable. Surely there's tons of code out there that relies on this working correctly. But the integrity check is useful regardless, because the mod time on the client isn't necessarily reliable, as many processes can easily mess with it, even by accident.

In the meantime, how about I submit a PR with just the download refactor? It has benefits on its own, like not passing the content of downloaded files as strings. 😅

Copy link

@michalpristas
Copy link
Contributor

i suspect this is primarily for local usage.
also i would prefer to have other option as well meaning cleanup before build in case i had last build with KEEP and i changed my mind

@swiatekm
Copy link
Contributor Author

i suspect this is primarily for local usage. also i would prefer to have other option as well meaning cleanup before build in case i had last build with KEEP and i changed my mind

Isn't mage clean good enough for that? I'd prefer not to add even more environment variables. We already have a ton.

@swiatekm
Copy link
Contributor Author

swiatekm commented Sep 9, 2025

I've rebased this change on #9218, making it much simpler. With #9226, we always verify checksums, so we'll immediately know if the artifacts API served us a stale download, or if the file is corrupted for whatever reason. So this change should be quite safe now. It's only really intended for local use, although we can later look into keeping the downloaded binaries on CI runners, if we want.

@pkoutsovasilis @michalpristas can you review again? 🙏

@swiatekm swiatekm force-pushed the chore/cache-external-binaries branch from 4680a90 to 9a9e956 Compare September 10, 2025 11:19
@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Sep 10, 2025

@swiatekm please merge the latest main in here; the failure you see is because I merged this one #9832 and I managed to hit your CI run just right in the middle; so the clean up step run with the latest code in main but the pipeline was loaded before the merge so any changes are lost.

PS: we are working on a solution to get away from such failures

@swiatekm swiatekm force-pushed the chore/cache-external-binaries branch from 9a9e956 to cb04629 Compare September 10, 2025 13:00
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @swiatekm

@swiatekm swiatekm merged commit 6aed8b1 into main Sep 12, 2025
23 checks passed
@swiatekm swiatekm deleted the chore/cache-external-binaries branch September 12, 2025 10:59
Copy link
Contributor

@Mergifyio backport 8.18 8.19 9.0 9.1

Copy link
Contributor

mergify bot commented Sep 12, 2025

backport 8.18 8.19 9.0 9.1

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md

Co-authored-by: Panos Koutsovasilis <[email protected]>

* fixup! Only download artifacts which aren't present

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
(cherry picked from commit 6aed8b1)
mergify bot pushed a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md

Co-authored-by: Panos Koutsovasilis <[email protected]>

* fixup! Only download artifacts which aren't present

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
(cherry picked from commit 6aed8b1)
mergify bot pushed a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md

Co-authored-by: Panos Koutsovasilis <[email protected]>

* fixup! Only download artifacts which aren't present

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
(cherry picked from commit 6aed8b1)
mergify bot pushed a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md

Co-authored-by: Panos Koutsovasilis <[email protected]>

* fixup! Only download artifacts which aren't present

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
(cherry picked from commit 6aed8b1)
swiatekm added a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md



* fixup! Only download artifacts which aren't present

---------


(cherry picked from commit 6aed8b1)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Panos Koutsovasilis <[email protected]>
swiatekm added a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md



* fixup! Only download artifacts which aren't present

---------


(cherry picked from commit 6aed8b1)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Panos Koutsovasilis <[email protected]>
swiatekm added a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md



* fixup! Only download artifacts which aren't present

---------


(cherry picked from commit 6aed8b1)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Panos Koutsovasilis <[email protected]>
swiatekm added a commit that referenced this pull request Sep 12, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md



* fixup! Only download artifacts which aren't present

---------


(cherry picked from commit 6aed8b1)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Panos Koutsovasilis <[email protected]>
v1v added a commit that referenced this pull request Sep 16, 2025
* upstream: (26 commits)
  fix: ensure EDOT subprocess shuts down gracefully on agent termination (#9886)
  [main][Automation] Update versions (#9976)
  Add Collector reference docs and automation (#9953)
  [beatreceivers] Integrate beatsauthextension (#9257)
  [main][Automation] Update versions (#9941)
  Update OTel components to v0.132.0/v1.38.0 (#9954)
  Enhancement/5235 wrap errors when marking upgrade (#9366)
  Mount Go build cache into crossbuild container (#9094)
  Liveness agent state (#9673)
  [main][Automation] Bump VM Image version to 1757725254 (#9942)
  Enhancement/5235 correctly wrap errors from copyActionDir and copyRunDirectory (#9349)
  [main][Automation] Update elastic/beats to afc53c0479ac (#9874)
  Add -coverpkg option when running unit test to calculate coverage across packages (#9913)
  Cache binaries downloaded for packaging locally (#9133)
  [main][Automation] Update versions (#9897)
  Disable flaky test TestBeatsReceiverLogs (#9891)
  Allow overriding AGENT_PACKAGE_VERSION and MANIFEST_URL when USE_PACKAGE_VERSION=true (#9864)
  add ingest-docs team as CODEOWNERS for release notes and docset.yml (#9865)
  fix: correct spelling of 'output' in various templates and monitoring code (#9827)
  k8s: Add comment around hostUsers for Universal Profiling deployments (#9847)
  ...
intxgo pushed a commit to intxgo/elastic-agent that referenced this pull request Sep 24, 2025
* Add an env variable to keep the download archive

* Only download artifacts which aren't present

* Document the KEEP_ARCHIVE env variable

* Update README.md

Co-authored-by: Panos Koutsovasilis <[email protected]>

* fixup! Only download artifacts which aren't present

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-active-all Automated backport with mergify to all the active branches chore Tasks that just need to be done, they are neither bug, nor enhancements skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants