Skip to content

Conversation

dinmukhamedm
Copy link
Collaborator

@dinmukhamedm dinmukhamedm commented Jul 3, 2025

#3066

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

This PR adds exception recording to span events in OpenAI instrumentation wrappers and updates tests to verify this behavior.

  • Behavior:
    • Add span.record_exception(e) to exception handling in chat_wrapper(), completion_wrapper(), embeddings_wrapper(), and runs_create_wrapper() to log exceptions as span events.
    • Update EventHandleWrapper.on_exception() to record exceptions in spans.
  • Tests:
    • Add tests for exception handling in test_chat.py, test_chat_parse.py, test_completions.py, and test_embeddings.py to verify exceptions are recorded as span events.
    • Tests cover both synchronous and asynchronous scenarios.

This description was created by Ellipsis for 6e396bc. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 6e396bc in 2 minutes and 20 seconds. Click for details.
  • Reviewed 524 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:39
  • Draft comment:
    Hard-coded token usage (8) may be brittle if encoding changes. Consider dynamically computing or loosening this assertion.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:87
  • Draft comment:
    Using hard-coded expected log event content (for 'gen_ai.choice') might become fragile if response formatting changes. Consider verifying key fields or using regex to allow some flexibility.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:277
  • Draft comment:
    Repeated assertions on hard-coded token values (e.g., prompt tokens = 8) and fixed API base URLs can be brittle. Consider centralizing expected constants or adding comments to clarify these expectations.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:400
  • Draft comment:
    Assertions comparing fixed response IDs (e.g. 'cmpl-8wq43c8U5ZZCQBX5lrSpsANwcd3OF') may be brittle with VCR responses. Consider matching against a pattern or documenting why these values are stable.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:540
  • Draft comment:
    Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The model name "gpt-4o" appears to be intentionally used for testing purposes. It's used consistently across multiple test cases, including tests that expect authentication errors. The tests are passing with this model name. The comment assumes this is a typo but there's no evidence to support that - in fact, the evidence suggests it's intentional. Could this be a real typo that was accidentally copied across all test cases? Could using an invalid model name affect the test coverage? The tests specifically check for authentication errors, not model validation errors. Using an invalid model name is actually good for testing as it ensures the error handling works correctly without making real API calls. The comment should be deleted. The model name appears to be intentionally set for testing purposes, and changing it could actually make the tests less effective.
6. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:574
  • Draft comment:
    Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The consistent use of "gpt-4o" across all test cases strongly suggests this is intentional. These are tests for error handling and API behavior, and using an invalid model name could be part of the test design. In fact, looking at the test cases where this appears, they're testing error handling scenarios with invalid API keys, which makes an invalid model name even more likely to be intentional. Could this be a genuine typo that was copy-pasted throughout the test file? The model name "gpt-4o" does look unusual. While "gpt-4o" is unusual, the fact that these are tests specifically designed to handle errors and invalid inputs, combined with the consistent usage across all test cases, strongly suggests this is intentional rather than a copy-pasted typo. Delete the comment. The unusual model name appears to be intentionally used for testing error scenarios.

Workflow ID: wflow_omj7pCLGmyK7tvtb

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@dinmukhamedm dinmukhamedm requested a review from nirga July 4, 2025 13:33
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dinmukhamedm! :)

@nirga nirga merged commit a54e67f into traceloop:main Jul 5, 2025
9 checks passed
@nirga nirga deleted the openai-span-exception branch July 5, 2025 14:06
amitalokbera pushed a commit to amitalokbera/openllmetry that referenced this pull request Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants