Skip to content

Conversation

khushijain21
Copy link
Contributor

@khushijain21 khushijain21 commented Aug 6, 2025

What does this PR do?

This PR adds beatsauthextension to elasticsearch config translation layer and also propagates local logger to the translation logic

Why is it important?

It replaces es-exporter's Rountripper with beat's implementation of it. This is important for 1:1 behavioral compatibility between beats and beatreceivers

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None

How to test this PR locally

Check E2E test

Related issues

Copy link
Contributor

mergify bot commented Aug 6, 2025

This pull request does not have a backport label. Could you fix it @khushijain21? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@khushijain21 khushijain21 marked this pull request as ready for review August 12, 2025 12:47
@khushijain21 khushijain21 requested a review from a team as a code owner August 12, 2025 12:47
@khushijain21 khushijain21 marked this pull request as draft August 12, 2025 12:47
@khushijain21 khushijain21 marked this pull request as ready for review August 14, 2025 13:02
@khushijain21 khushijain21 marked this pull request as draft August 14, 2025 13:09
@khushijain21 khushijain21 added backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch labels Aug 18, 2025
@khushijain21 khushijain21 marked this pull request as ready for review August 18, 2025 04:37
@cmacknz
Copy link
Member

cmacknz commented Sep 12, 2025

A couple of observations testing this manually:

  1. The extension is present in the status output unconditionally in the collector format. This should get fixed in the yet to be created issue for tying auth extension status to output unit status. We can address this separately as a sub-issue in [beats receivers] Surface which output configuration caused the collector to exit #9771.
❯ sudo elastic-development-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (HEALTHY) Running
   └─ extensions
      ├─ status: StatusOK
      └─ extension:beatsauth/_agent-component/monitoring
         └─ status: StatusOK
  1. If I use the configuration below, looking in diagnostics then the auth extension configuration is missing the ssl blocks which surprised me.
outputs:
  default:
    type: elasticsearch
    hosts: [127.0.0.1:9200]
    api_key: "example-key"
    preset: balanced
    ssl.key: /tmp/key.pem
    ssl.certificate: /tmp/cert.pem

inputs:
  - type: system/metrics
    id: unique-system-metrics-input
    data_stream.namespace: default
    use_output: default
    streams:
      - metricsets:
        - cpu
        data_stream.dataset: system.cpu

agent.monitoring:
  enabled: true
  logs: true
  metrics: true
  _runtime_experimental: otel

In otel-merged.yml from generated from that configuration I see the following with the ssl parameters missing:

exporters:
    elasticsearch/_agent-component/monitoring:
        api_key: <REDACTED>
        auth:
            authenticator: <REDACTED>
        batcher:
            enabled: true
            max_size: 1600
            min_size: 0
        compression: gzip
        compression_params:
            level: 1
        endpoints:
            - http://127.0.0.1:9200
        idle_conn_timeout: 3s
        logs_dynamic_id:
            enabled: true
        mapping:
            mode: bodymap
        retry:
            enabled: true
            initial_interval: 1s
            max_interval: 1m0s
            max_retries: 3
        timeout: 1m30s
extensions:
    beatsauth/_agent-component/monitoring:
        idle_connection_timeout: 3s
        proxy_disable: false
        timeout: 1m30s

The full agent status from this config shows that the ssl parameters are respected by the standard beat system/metrics input but appear to be ignored by the monitoring exporter since it's trying to connect to ES instead of failing to read the non-existent .pem files I told it to look for:

sudo elastic-development-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (FAILED) OTel manager failed: failed to generate otel config: error translating config for output: monitoring, unit: http/metrics-monitoring, error: failed unpacking config. open /tmp/cert.pem: no such file or directory /tmp/cert.pem accessing config
   ├─ beat/metrics-monitoring
   │  ├─ status: (DEGRADED) DEGRADED
   │  └─ beat/metrics-monitoring
   │     └─ status: (DEGRADED) Elasticsearch request failed: dial tcp 127.0.0.1:9200: connect: connection refused
   ├─ filestream-monitoring
   │  ├─ status: (DEGRADED) DEGRADED
   │  └─ filestream-monitoring
   │     └─ status: (DEGRADED) Elasticsearch request failed: dial tcp 127.0.0.1:9200: connect: connection refused
   ├─ http/metrics-monitoring
   │  ├─ status: (DEGRADED) DEGRADED
   │  └─ http/metrics-monitoring
   │     └─ status: (DEGRADED) Elasticsearch request failed: dial tcp 127.0.0.1:9200: connect: connection refused
   ├─ system/metrics-default
   │  ├─ status: (HEALTHY) Healthy: communicating with pid '64594'
   │  ├─ system/metrics-default
   │  │  └─ status: (FAILED) could not start output: failed to reload output: open /tmp/cert.pem: no such file or directory /tmp/cert.pem accessing 'elasticsearch'
   │  └─ system/metrics-default-unique-system-metrics-input
   │     └─ status: (STARTING) Starting
   └─ extensions
      ├─ status: StatusOK
      └─ extension:beatsauth/_agent-component/monitoring
         └─ status: StatusOK

This one needs to be investigated and fixed before we can merge this.

  1. Mikolaj's point about extension conflicts is a good one and can also be tackled as a separate PR and issue. I think this is something we need more testing for in general with hybrid agent.

@cmacknz
Copy link
Member

cmacknz commented Sep 12, 2025

For testing we already have elastic/beats#45491 for tracking that separately beyond what is already done in here.

Copy link
Contributor

mergify bot commented Sep 14, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b beatsauth upstream/beatsauth
git merge upstream/main
git push upstream beatsauth

@khushijain21
Copy link
Contributor Author

khushijain21 commented Sep 15, 2025

A couple of observations testing this manually:

For your second point above.

The full agent status from this config shows that the ssl parameters are respected by the standard beat system/metrics input but appear to be ignored by the monitoring exporter since it's trying to connect to ES instead of failing to read the non-existent .pem files I told it to look for:

I believe you started elastic-agent without the ssl block first first and then introduced

    ssl.key: /tmp/key.pem
    ssl.certificate: /tmp/cert.pem

so what happens is elastic-agent tries to create a new otel config from this, fails because it cannot read the non-existentent pem file and otel-manager continues with the previous elastic-agent.yml config without the ssl block. This is why you it continues to show connection refused.

Although, it does report the error here. "Failed to translate the config"

└─ elastic-agent
   ├─ status: (FAILED) OTel manager failed: failed to generate otel config: error translating config for output: monitoring, unit: http/metrics-monitoring, error: failed unpacking config. open /tmp/cert.pem: no such file or directory /tmp/cert.pem accessing config

In otel-merged.yml from generated from that configuration I see the following with the ssl parameters missing:

This is also why the diagnostics have missing ssl config. It never built the new config with incorrect ssl file path in the first place

The standard beat system/metrics throws an explicit error that it failed to read the pem file because we haven't set the _runtime_experimental:otel and it starts the standard beat

@cmacknz
Copy link
Member

cmacknz commented Sep 15, 2025

Although, it does report the error here. "Failed to translate the config"

Thanks I totally missed agent was reporting the failure at the top level, we can iterate on this separately as part of #9771. Your explanation for why the ssl blocks are missing from the otel-merged.yml make sense, I would prefer we show what we are trying and failing to run there but I don't think it needs to block this PR.

Copy link
Contributor

mergify bot commented Sep 15, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b beatsauth upstream/beatsauth
git merge upstream/main
git push upstream beatsauth

@khushijain21
Copy link
Contributor Author

@cmacknz can you approve this PR? so we can merge this

@cmacknz cmacknz enabled auto-merge (squash) September 15, 2025 16:55
Copy link

@cmacknz cmacknz merged commit 779fafd into elastic:main Sep 15, 2025
23 checks passed
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @khushijain21

mergify bot pushed a commit that referenced this pull request Sep 15, 2025
* [beatreceivers] Integrate beatsauthextension

* add test cases

* final tests and this works

* update

* remove agent.port

* mage otel:readme

* address review comments

* check is ssl.TLS is non-nil

* address review comments

* remove port

* make notice for beatsauth

* update beatsauthextension

* beatsauth final test

* final fix

* add beatsauthextension test

* Address comments

* fix test

* fix test

* change comment

* update beatsauthextension

* Update internal/pkg/otel/translate/otelconfig.go

Co-authored-by: Craig MacKenzie <[email protected]>

* fix ci

* add a test that a unique extension is created per output

* fix ci

* add comment

* add better test

* fix test

---------

Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit 779fafd)

# Conflicts:
#	internal/pkg/otel/manager/manager_test.go
#	internal/pkg/otel/translate/otelconfig.go
#	internal/pkg/otel/translate/otelconfig_test.go
mergify bot pushed a commit that referenced this pull request Sep 15, 2025
* [beatreceivers] Integrate beatsauthextension

* add test cases

* final tests and this works

* update

* remove agent.port

* mage otel:readme

* address review comments

* check is ssl.TLS is non-nil

* address review comments

* remove port

* make notice for beatsauth

* update beatsauthextension

* beatsauth final test

* final fix

* add beatsauthextension test

* Address comments

* fix test

* fix test

* change comment

* update beatsauthextension

* Update internal/pkg/otel/translate/otelconfig.go

Co-authored-by: Craig MacKenzie <[email protected]>

* fix ci

* add a test that a unique extension is created per output

* fix ci

* add comment

* add better test

* fix test

---------

Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit 779fafd)

# Conflicts:
#	internal/pkg/otel/README.md
#	internal/pkg/otel/manager/manager.go
#	internal/pkg/otel/manager/manager_test.go
#	internal/pkg/otel/translate/otelconfig.go
#	internal/pkg/otel/translate/otelconfig_test.go
v1v added a commit that referenced this pull request Sep 16, 2025
* upstream: (26 commits)
  fix: ensure EDOT subprocess shuts down gracefully on agent termination (#9886)
  [main][Automation] Update versions (#9976)
  Add Collector reference docs and automation (#9953)
  [beatreceivers] Integrate beatsauthextension (#9257)
  [main][Automation] Update versions (#9941)
  Update OTel components to v0.132.0/v1.38.0 (#9954)
  Enhancement/5235 wrap errors when marking upgrade (#9366)
  Mount Go build cache into crossbuild container (#9094)
  Liveness agent state (#9673)
  [main][Automation] Bump VM Image version to 1757725254 (#9942)
  Enhancement/5235 correctly wrap errors from copyActionDir and copyRunDirectory (#9349)
  [main][Automation] Update elastic/beats to afc53c0479ac (#9874)
  Add -coverpkg option when running unit test to calculate coverage across packages (#9913)
  Cache binaries downloaded for packaging locally (#9133)
  [main][Automation] Update versions (#9897)
  Disable flaky test TestBeatsReceiverLogs (#9891)
  Allow overriding AGENT_PACKAGE_VERSION and MANIFEST_URL when USE_PACKAGE_VERSION=true (#9864)
  add ingest-docs team as CODEOWNERS for release notes and docset.yml (#9865)
  fix: correct spelling of 'output' in various templates and monitoring code (#9827)
  k8s: Add comment around hostUsers for Universal Profiling deployments (#9847)
  ...
intxgo pushed a commit to intxgo/elastic-agent that referenced this pull request Sep 24, 2025
* [beatreceivers] Integrate beatsauthextension

* add test cases

* final tests and this works

* update

* remove agent.port

* mage otel:readme

* address review comments

* check is ssl.TLS is non-nil

* address review comments

* remove port

* make notice for beatsauth

* update beatsauthextension

* beatsauth final test

* final fix

* add beatsauthextension test

* Address comments

* fix test

* fix test

* change comment

* update beatsauthextension

* Update internal/pkg/otel/translate/otelconfig.go

Co-authored-by: Craig MacKenzie <[email protected]>

* fix ci

* add a test that a unique extension is created per output

* fix ci

* add comment

* add better test

* fix test

---------

Co-authored-by: Craig MacKenzie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
7 participants