Skip to content

Conversation

NoahStapp
Copy link
Contributor

@NoahStapp NoahStapp commented Sep 16, 2025

This PR removes the full document from the error message and stores it in the document property on the error instead.

@NoahStapp NoahStapp requested a review from a team as a code owner September 16, 2025 18:39
bson/errors.py Outdated

@property
def document(self) -> Any:
"""The invalid document that caused the error."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a versionadded and changelog reference.

bson/errors.py Outdated
return self._document

@document.setter
def document(self, value: Any) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being a useful behavior for users, so making this public seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was suggesting we make it private, not public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for making it private? Security concerns if users misuse it? I find having the full document very useful for debugging and I imagine users do too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the getter should be public but when would we expect a user to set the document? We try to be conservative with public apis and only add ones that have a strong justification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops sorry I misread which method this was about. Totally agree, fixing.

goto cleanup;
}
PyObject *new_msg = PyUnicode_FromFormat("Invalid document %s | %s", dict_str_utf8, msg_utf8);
PyObject *new_msg = PyUnicode_FromFormat("Invalid document: %s", msg_utf8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an invalid field in a nested doc, does this yield errors like "Invalid document: Invalid document: Invalid document: Invalid document: Invalid document: ...."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it works as expected:

encode({"t": {"f": Wrapper(1)}})

# Produces
Invalid document: cannot encode object: 1, of type: <class 'test.test_bson.TestBSON.test_doc_in_invalid_document_error_as_property.<locals>.Wrapper'>

elements.append(_element_to_bson(key, value, check_keys, opts))
except InvalidDocument as err:
raise InvalidDocument(f"Invalid document {doc} | {err}") from err
raise InvalidDocument(f"Invalid document: {err}", doc) from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this InvalidDocument except clause include the encoding of _id too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a pre-existing bug, no? Happy to fix if we identify why the code currently looks like this. Maybe we assume that _id will be valid if it reaches this far?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it would be a pre-existing bug. Could you open a ticket for it?

>>>> encode({'_id': set()})
Traceback (most recent call last):
  File "<python-input-6>", line 1, in <module>
    encode({'_id': set()})
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1053, in encode
    return _dict_to_bson(document, check_keys, codec_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1006, in _dict_to_bson
    elements.append(_name_value_to_bson(b"_id\x00", doc["_id"], check_keys, opts))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 980, in _name_value_to_bson
    raise InvalidDocument(f"cannot encode object: {value!r}, of type: {type(value)!r}")
bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>

vs the expected behavior:

>>>> encode({'_id': 1, 'foo': set()})
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1010, in _dict_to_bson
    elements.append(_element_to_bson(key, value, check_keys, opts))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 994, in _element_to_bson
    return _name_value_to_bson(name, value, check_keys, opts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 980, in _name_value_to_bson
    raise InvalidDocument(f"cannot encode object: {value!r}, of type: {type(value)!r}")
bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    encode({'_id': 1, 'foo': set()})
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1053, in encode
    return _dict_to_bson(document, check_keys, codec_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1012, in _dict_to_bson
    raise InvalidDocument(f"Invalid document {doc} | {err}") from err
bson.errors.InvalidDocument: Invalid document {'_id': 1, 'foo': set()} | cannot encode object: set(), of type: <class 'set'>

Copy link
Contributor Author

@NoahStapp NoahStapp Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened PYTHON-5560 to track.

goto cleanup;
}
// Add doc to the error class as a property.
PyObject_SetAttrString(InvalidDocument, "document", dict);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be "_document" now?

Copy link
Member

@ShaneHarvey ShaneHarvey Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also highlights that this PR needs to add a test to ensure .document is the expected dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the tests fail if this is _document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the two modified tests in this PR not sufficient to ensure that .document behaves as expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is setting a class attribute on the InvalidDocument class so the logic is incorrect, eg:

InvalidDocument.document = dict

Copy link
Contributor Author

@NoahStapp NoahStapp Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I misunderstood what you meant. You're right that _error returns a class, not an instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Had to do some C API delving to figure this out 😅

sleepyStick
sleepyStick previously approved these changes Sep 17, 2025
}
PyObject *new_msg = PyUnicode_FromFormat("Invalid document: %s", msg_utf8);
// Add doc to the error instance as a property.
PyObject *new_evalue = PyObject_CallFunctionObjArgs(InvalidDocument, new_msg, dict, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check the new_msg == NULL case before we call PyObject_CallFunctionObjArgs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants