Skip to content

Conversation

cpplearner
Copy link
Contributor

@cpplearner cpplearner commented Apr 17, 2023

This implements [format.string.escaped], which is part of WG21-P2286R8 "Formatting Ranges" and modified by WG21-P2713R1 "Escaping Improvements In std::format". Works towards #2919.

To implement this feature, two arrays, __printable_ranges and _Grapheme_Extend_ranges, are added to __msvc_format_ucd_tables.hpp.

  • __printable_ranges represents code points whose General_Category is in the groups L, M, N, P, S (that is, code points that are not from categories Z or C), plus the ASCII space character.
    • Characters outside of these ranges are always escaped, usually using the \u{hex-digit-sequence} format. ([format.string.escaped]/(2.2.1.2.1))
    • It might make sense to store the unmodified General_Category, instead of this invented property. This requires more storage and a new data structure, though.
  • _Grapheme_Extend_ranges represents code points with the Unicode property Grapheme_Extend=Yes.
    • Characters in these ranges are escaped unless they immediately follow an unescaped character. ([format.string.escaped]/(2.2.1.2.2))
    • It would be more space efficient to reuse the existing data for Grapheme_Cluster_Break: Grapheme_Extend=Yes is Grapheme_Cluster_Break=Extend minus Emoji_Modifier=Yes, and Emoji_Modifier=Yes is just 1F3FB..1F3FF. I chose to define a new array for simplicity.

When the literal encoding is not UTF-8, UTF-16, or UTF-32, the set of "separator or non-printable characters" is implementation-defined. In this implementation, the set consists of all characters that correspond to non-printable Unicode code points (that is, code points outside of __printable_ranges, see above). If a character is non-printable, it is translated into \u{XXXX}, where XXXX is the hex value of the Unicode code point (not the original value).

If a code unit sequence cannot be converted to a Unicode scalar value, the \x{XX} escape sequence is used.

@cpplearner cpplearner requested a review from a team as a code owner April 17, 2023 08:22
@cpplearner cpplearner force-pushed the p2286r8-formatting-ranges branch from 4448022 to cc9aabc Compare April 17, 2023 08:53
@StephanTLavavej StephanTLavavej added format C++20/23 format cxx23 C++23 feature labels Apr 17, 2023
@StephanTLavavej StephanTLavavej self-assigned this Apr 26, 2023
StephanTLavavej

This comment was marked as resolved.

Comment on lines +1681 to +1685
#if _HAS_CXX23
case '?':
_Cat = _Presentation_type_category::_Escape;
break;
#endif // _HAS_CXX23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential future improvement:

We could recognize this in c++20 and earlier modes and emit a helpful diagnostic.

@StephanTLavavej StephanTLavavej removed their assignment May 12, 2023
@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej
Copy link
Member

FYI @barcharcraz, changes have been pushed since your approval but they were minor so I don't think you need to take another look.

@StephanTLavavej StephanTLavavej self-assigned this May 17, 2023
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@CaseyCarter CaseyCarter merged commit 9fdc896 into microsoft:main May 18, 2023
@CaseyCarter
Copy link
Contributor

Thanks for implementing this sub-feature, which is not questionable in any way!

@cpplearner cpplearner deleted the p2286r8-formatting-ranges branch May 18, 2023 23:50
JMazurkiewicz added a commit to JMazurkiewicz/STL that referenced this pull request Jul 30, 2023
* This PR implements debug-enabled standard `formatter` specializations,
* Drive-by: rename tests added in microsoft#3656 - use `text_formatting_` prefix instead of `formatting_ranges_` (for consistency),
* Towards microsoft#2919.
JMazurkiewicz added a commit to JMazurkiewicz/STL that referenced this pull request Jul 30, 2023
* This PR implements debug-enabled standard `formatter` specializations,
* Drive-by: rename tests added in microsoft#3656 - use `text_formatting_` prefix instead of `formatting_ranges_` (for consistency),
* Towards microsoft#2919.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cxx23 C++23 feature format C++20/23 format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants