-
Notifications
You must be signed in to change notification settings - Fork 798
fix(openai): record exception as span events as well #3067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 6e396bc in 2 minutes and 20 seconds. Click for details.
- Reviewed
524
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
6
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:39
- Draft comment:
Hard-coded token usage (8) may be brittle if encoding changes. Consider dynamically computing or loosening this assertion. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:87
- Draft comment:
Using hard-coded expected log event content (for 'gen_ai.choice') might become fragile if response formatting changes. Consider verifying key fields or using regex to allow some flexibility. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:277
- Draft comment:
Repeated assertions on hard-coded token values (e.g., prompt tokens = 8) and fixed API base URLs can be brittle. Consider centralizing expected constants or adding comments to clarify these expectations. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py:400
- Draft comment:
Assertions comparing fixed response IDs (e.g. 'cmpl-8wq43c8U5ZZCQBX5lrSpsANwcd3OF') may be brittle with VCR responses. Consider matching against a pattern or documenting why these values are stable. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:540
- Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The model name "gpt-4o" appears to be intentionally used for testing purposes. It's used consistently across multiple test cases, including tests that expect authentication errors. The tests are passing with this model name. The comment assumes this is a typo but there's no evidence to support that - in fact, the evidence suggests it's intentional. Could this be a real typo that was accidentally copied across all test cases? Could using an invalid model name affect the test coverage? The tests specifically check for authentication errors, not model validation errors. Using an invalid model name is actually good for testing as it ensures the error handling works correctly without making real API calls. The comment should be deleted. The model name appears to be intentionally set for testing purposes, and changing it could actually make the tests less effective.
6. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py:574
- Draft comment:
Typo alert: The model parameter is set to "gpt-4o". Please verify if this is intentional or if it should be "gpt-4". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The consistent use of "gpt-4o" across all test cases strongly suggests this is intentional. These are tests for error handling and API behavior, and using an invalid model name could be part of the test design. In fact, looking at the test cases where this appears, they're testing error handling scenarios with invalid API keys, which makes an invalid model name even more likely to be intentional. Could this be a genuine typo that was copy-pasted throughout the test file? The model name "gpt-4o" does look unusual. While "gpt-4o" is unusual, the fact that these are tests specifically designed to handle errors and invalid inputs, combined with the consistent usage across all test cases, strongly suggests this is intentional rather than a copy-pasted typo. Delete the comment. The unusual model name appears to be intentionally used for testing error scenarios.
Workflow ID: wflow_omj7pCLGmyK7tvtb
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dinmukhamedm! :)
#3066
feat(instrumentation): ...
orfix(instrumentation): ...
.Important
This PR adds exception recording to span events in OpenAI instrumentation wrappers and updates tests to verify this behavior.
span.record_exception(e)
to exception handling inchat_wrapper()
,completion_wrapper()
,embeddings_wrapper()
, andruns_create_wrapper()
to log exceptions as span events.EventHandleWrapper.on_exception()
to record exceptions in spans.test_chat.py
,test_chat_parse.py
,test_completions.py
, andtest_embeddings.py
to verify exceptions are recorded as span events.This description was created by
for 6e396bc. You can customize this summary. It will automatically update as commits are pushed.