Skip to content

Conversation

majanjua-amzn
Copy link
Contributor

Problems

Propagators

Customers using different environments are provided different default propagators, e.g. tracecontext,baggage,xray vs baggage,xray,tracecontext, resulting in differing behaviour between. This is especially true because the AwsXrayPropagator is currently bugged and does not allow for tracestate propagation between services if it after the regular tracecontext propagator in the list of propagators. This bug exists in Java as well as in other languages.

Inaccurate anomaly statistics

Anomalies are currently always counted so long as the rule propagated from an upstream service or the otherwise matched sampling rule have boost enabled. However, this assumes that the matched rule is relevant to the current call chain. Instead, we want to check if the call came from a service that is already instrumented by checking if the provided parentContext is valid.

Changes

Propagators

  • Introduced a patched version of the aws-xray-propagator from the contrib that injects the trace state value of the sampling rule (hashed) into the baggage so that downstream services can propagate the information even if one of those channels fails to be properly propagated
  • Reintroduced patching logic for opentelemetry-java-instrumentation. This is needed again as the patch aims to change the version of aws-xray-propagator extension the upstream instrumentation consumes, which is built into the upstream and cannot be changed with the use of SPIs (previously removed in Instrumentation Patch Removal and SPI AWS SDK Test Addition #1120)
    • Reintroduced any logic in workflows that ensured the patching was working correctly
  • Updated shouldSample to fallback to baggage if trace state does not contain the sampling rule, allowing null if that does not have the value either (see Inaccurate anomaly statistics section)

Inaccurate anomaly statistics

  • Added some conditions in shouldSample to check if the upstream context is valid in case we don't want to propagate the matched sampling rule, concluding that the current service is not the root service
  • Allowed the sampling rule input for AwsSamplingResult to be nullable to support the above change

General refactors and improvements

  • Extracted anomaly identification logic to AnomalyDetectionResult isAnomaly(ReadableSpan span, SpanData spanData) which returns an object, e.g. result, that provides result.shouldBoostSampling() and result. shouldCaptureAnomalySpan()
  • Extracted logic that updated the traceUsageCache to a separate function: void updateTraceUsageCache(String traceId, boolean isSpanCaptured, boolean isCountedAsAnomalyForBoost)
  • Allow customers to more easily turn off all adaptive sampling in a given service by specifying a config with anomalyConditions: [] instead of needing anomalyConditions: [{}] - more intuitive
    • Did this by changing (anomalyConditions != null && !anomalyConditions.isEmpty()) to (anomalyConditions != null)

Testing

Deployed to demo environment and tested the following scenarios where each scenario is a list of services called in series instrumented with the listed SDK version in Java

  • New SDK -> 2.11.3 with tracecontext,baggage,xray -> new SDK: Validated third service still provided boost statistics for the first service despite a broken xray propagator in between, relying on baggage instead of trace state
  • New SDK -> 2.11.3 with baggage,xray,tracecontext -> new SDK: Validated third service provided boost statistics for the first service on the happy case where trace state is propagated correctly
  • New SDK -> New SDK -> New SDK: Validated both the second and third service sent appropriate statistics to the first service's sampling rule

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@majanjua-amzn majanjua-amzn self-assigned this Sep 12, 2025
@majanjua-amzn majanjua-amzn requested a review from a team as a code owner September 12, 2025 23:37
@majanjua-amzn majanjua-amzn added enhancement New feature or request X-Ray AWS X-Ray components traces Tracing related issues java Pull requests that update Java code github_actions Pull requests that update GitHub Actions code labels Sep 12, 2025
wangzlei
wangzlei previously approved these changes Sep 12, 2025
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/v2.11.x@22a807b). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                Coverage Diff                 @@
##             release/v2.11.x    #1186   +/-   ##
==================================================
  Coverage                   ?   66.92%           
  Complexity                 ?      519           
==================================================
  Files                      ?       54           
  Lines                      ?     2676           
  Branches                   ?      372           
==================================================
  Hits                       ?     1791           
  Misses                     ?      749           
  Partials                   ?      136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@majanjua-amzn majanjua-amzn merged commit 899f40e into release/v2.11.x Sep 15, 2025
4 checks passed
@majanjua-amzn majanjua-amzn deleted the adaptive-sampling-2.11 branch September 15, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request github_actions Pull requests that update GitHub Actions code java Pull requests that update Java code traces Tracing related issues X-Ray AWS X-Ray components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants