fix(openai): record exception as span events as well #3067

dinmukhamedm · 2025-07-03T23:54:08Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

This PR adds exception recording to span events in OpenAI instrumentation wrappers and updates tests to verify this behavior.

Behavior:
- Add span.record_exception(e) to exception handling in chat_wrapper(), completion_wrapper(), embeddings_wrapper(), and runs_create_wrapper() to log exceptions as span events.
- Update EventHandleWrapper.on_exception() to record exceptions in spans.
Tests:
- Add tests for exception handling in test_chat.py, test_chat_parse.py, test_completions.py, and test_embeddings.py to verify exceptions are recorded as span events.
- Tests cover both synchronous and asynchronous scenarios.

^{This description was created by}^{for 6e396bc. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 6e396bc in 2 minutes and 20 seconds. Click for details.

Reviewed 524 lines of code in 10 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:39

Draft comment:
Hard-coded token usage (8) may be brittle if encoding changes. Consider dynamically computing or loosening this assertion.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:87

Draft comment:
Using hard-coded expected log event content (for 'gen_ai.choice') might become fragile if response formatting changes. Consider verifying key fields or using regex to allow some flexibility.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:277

Draft comment:
Repeated assertions on hard-coded token values (e.g., prompt tokens = 8) and fixed API base URLs can be brittle. Consider centralizing expected constants or adding comments to clarify these expectations.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:400

Draft comment:
Assertions comparing fixed response IDs (e.g. 'cmpl-8wq43c8U5ZZCQBX5lrSpsANwcd3OF') may be brittle with VCR responses. Consider matching against a pattern or documenting why these values are stable.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:540

Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The model name "gpt-4o" appears to be intentionally used for testing purposes. It's used consistently across multiple test cases, including tests that expect authentication errors. The tests are passing with this model name. The comment assumes this is a typo but there's no evidence to support that - in fact, the evidence suggests it's intentional. Could this be a real typo that was accidentally copied across all test cases? Could using an invalid model name affect the test coverage? The tests specifically check for authentication errors, not model validation errors. Using an invalid model name is actually good for testing as it ensures the error handling works correctly without making real API calls. The comment should be deleted. The model name appears to be intentionally set for testing purposes, and changing it could actually make the tests less effective.

6. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:574

Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The consistent use of "gpt-4o" across all test cases strongly suggests this is intentional. These are tests for error handling and API behavior, and using an invalid model name could be part of the test design. In fact, looking at the test cases where this appears, they're testing error handling scenarios with invalid API keys, which makes an invalid model name even more likely to be intentional. Could this be a genuine typo that was copy-pasted throughout the test file? The model name "gpt-4o" does look unusual. While "gpt-4o" is unusual, the fact that these are tests specifically designed to handle errors and invalid inputs, combined with the consistent usage across all test cases, strongly suggests this is intentional rather than a copy-pasted typo. Delete the comment. The unusual model name appears to be intentionally used for testing error scenarios.

Workflow ID: wflow_omj7pCLGmyK7tvtb

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nirga

Thanks @dinmukhamedm! :)

fix(openai): record exception as span events as well

6e396bc

ellipsis-dev bot reviewed Jul 3, 2025

View reviewed changes

dinmukhamedm added 2 commits July 4, 2025 12:53

remove raise e with raise

a60b024

add error.type attribute

4a986aa

dinmukhamedm requested a review from nirga July 4, 2025 13:33

nirga approved these changes Jul 5, 2025

View reviewed changes

nirga merged commit a54e67f into traceloop:main Jul 5, 2025
9 checks passed

nirga deleted the openai-span-exception branch July 5, 2025 14:06

amitalokbera pushed a commit to amitalokbera/openllmetry that referenced this pull request Jul 15, 2025

fix(openai): record exception as span events as well (traceloop#3067)

4ee7330

nina-kollman pushed a commit that referenced this pull request Aug 11, 2025

fix(openai): record exception as span events as well (#3067)

8f38a7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(openai): record exception as span events as well #3067

fix(openai): record exception as span events as well #3067

Uh oh!

dinmukhamedm commented Jul 3, 2025 •

edited by nirga

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

nirga left a comment

Uh oh!

Uh oh!

Uh oh!

fix(openai): record exception as span events as well #3067

fix(openai): record exception as span events as well #3067

Uh oh!

Conversation

dinmukhamedm commented Jul 3, 2025 • edited by nirga Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dinmukhamedm commented Jul 3, 2025 •

edited by nirga

Loading