Skip to content

Conversation

jorisvandenbossche
Copy link
Member

The backwards compatibility part of #61916

@jorisvandenbossche jorisvandenbossche modified the milestones: 3.0, 2.3.3 Sep 12, 2025
@jorisvandenbossche jorisvandenbossche added the Strings String extension data type and string data label Sep 12, 2025
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorisvandenbossche

I'm not sure i see it in the accompanying issue, but what if a user just wants object columns when str columns are also present in 3.0.

I see that you plan to add a warning

When a user does select_dtypes(include=[object]) in pandas 3.0, and we see that there are str columns, raise a warning mentioning to the user they likely want to do include=[str] instead.

else:
e = df[["a", "b"]]
# if using_infer_string:
# TODO warn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the issue you said And in any case, we should probably still add a warning to pandas 2.3 about this when the string mode is enabled (for if we do a 2.3.2 release)

so this TODO should this be part of this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR just restores the old behaviour, and in that case we don't need to warn, I think. We should add a warning that "object" will stop selecting string columns at some point, but in any case I want to only do that in a later PR because that is a lot more complicated.

It was for the case that we decided to keep the current-main behaviour of object not selecting string columns that we should have definitely added a warning to warn users that they are not getting the result they are expecting.

@jorisvandenbossche
Copy link
Member Author

I'm not sure i see it in the accompanying issue, but what if a user just wants object columns when str columns are also present in 3.0.

Yeah, it's noted at the bottom of the top comment in the issue: that's the annoying part that now that we do distinguish object and str columns, one might want to select only the object columns, and that is not yet really possible (as with this PR you also get the string columns) or would give a warning you have to ignore (once we add a warning to deprecate).
I am not sure there is any way around that.

@simonjayhawkins
Copy link
Member

could select_dtypes(include=[object], exclude=[str]) work to suppress the [planned in 3.0] warning

@jorisvandenbossche
Copy link
Member Author

Ah, yes, we should ensure that works fine without warning, then that would be a good workaround!

@jorisvandenbossche
Copy link
Member Author

Going to merge this so I can backport it for 2.3.3. Will keep the issue open targeted for 3.0 to add a warning.

@jorisvandenbossche jorisvandenbossche merged commit 2b25842 into pandas-dev:main Sep 21, 2025
41 checks passed
@jorisvandenbossche jorisvandenbossche deleted the string-dtype-select-dtypes branch September 21, 2025 13:38
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 21, 2025
@jorisvandenbossche
Copy link
Member Author

While reviewing the backport, I realized that we might want to limit this "backwards compatibility" change for only the default NaN str dtype, and not for the NA string dtype.
Because right now, for existing users of the string dtype, they should already be used to the fact that object does not select the string column (and they can already do include=["string"] to select the string columns). And if we now start to do that, that could be seen as a regression for the nullable string dtype.

That of course creates an inconsistency between default str vs nullable string dtype .. so we can certainly go either way.

Opened #62402 with the change, so can discuss further in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants