Skip to content

Conversation

rogercoll
Copy link
Contributor

The server should just return the flag if the agent is unidentified (service.name not known).

Fixes #708

The server should just return the flag if the agent is unidentified
(service.name not known).
@rogercoll rogercoll requested a review from a team as a code owner August 14, 2025 08:27
@rogercoll rogercoll requested review from david-luna, stevejgordon and a team August 14, 2025 08:27
@rogercoll
Copy link
Contributor Author

cc @trentm @LikeTheSalad (please, let me know if there's an APM wide Github group I can mention)

Copy link

@LikeTheSalad LikeTheSalad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers!

@LikeTheSalad
Copy link

cc @trentm @LikeTheSalad (please, let me know if there's an APM wide Github group I can mention)

I think @elastic/apm-agent-approvers should do for this PR.

rc.logger.Error(fmt.Sprintf("error retrieving remote configuration: %s", err), agentUidField)
// remote config client could not identify the agent
if errors.Is(err, apmconfig.UnidentifiedAgent) {
serverToAgent.Flags = uint64(protobufs.ServerToAgentFlags_ServerToAgentFlags_ReportFullState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the OpAMP spec is that it is not appropriate to be using the ReportFullState flag for this case -- where the AgentToServer.agent_description.identifying_attributes happens to not have some key that this particular OpAMP server wants to see (in this case service.name). The OpAMP spec suggests that agents "SHOULD" include service.name, but there aren't any required identifying_attributes.

The ReportFullState flag is entirely meant to allow Agent Status Compression, so that the typical AgentToServer message can be small (skipping details that the server likely already has for the agent).

For example, if the server receives an AgentToServer message and its local info for that agent (using the instanceUid) does not include an AgentDescription at all, then the server should respond including the ReportFullState flag.

(This OpAMP server impl should also be using the sequenceNum to decide whether to potentially set the ReportFullState flag as well. I found the example server in opamp-go.git illustrative here. See https://github.com/open-telemetry/opamp-go/blob/4e0574d43367de4a6ec7ac5e9df770d57faa8ca2/internal/examples/server/opampsrv/opampsrv.go#L109-L110 and https://github.com/open-telemetry/opamp-go/blob/4e0574d43367de4a6ec7ac5e9df770d57faa8ca2/internal/examples/server/data/agent.go#L233-L287)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that the apmconfig.UnidentifiedAgent error is currently doing double-duty:

  1. It is handling the case where loadedAgent has no value at all for identifyingAttributes, which would happen when the OpAMP server is restarted and agents are sending periodic AgentToServer messages without the agent_description property. In this case the server should respond with ReportFullState.
  2. It is also handling the case where the loadedAgent does have identifyingAttributes, but those attributes do not include service.name. In this case the server should not respond with ReportFullState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opamp_callbacks.go file is a generic implementation of an OpAMP server for remote configurations, but it is up to the RemoteConfig client implementation to decide wheter to return an apmconfig.UnidentifiedAgent error or not.

In the case of the (only) elasticsearch fetcher implementation, it assumes that the Elastic SDKs will always include the service.name. Thus, the Elasticserach fetcher implementation of the RemoteConfig client does not comply to the following OpAMP spec suggestion:

The OpAMP spec suggests that agents "SHOULD" include service.name, but there aren't any required identifying_attributes.

If Elastic SDKs are able to omit the service.name resource attribute in the identifyingAttributes field, perhaps it would be best to open a separate issue where we can create and discuss the fix (and the correct usage of ReportFullState).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: #720

@rogercoll rogercoll requested a review from trentm August 25, 2025 09:17
@rogercoll rogercoll merged commit 892a5df into elastic:main Aug 26, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[extension/apmconfig] Remove error message when sending ServerToAgent flags
3 participants