Skip to content

Conversation

charlesnicholson
Copy link
Owner

@charlesnicholson charlesnicholson commented May 26, 2025

This PR is a port / migration of #329 by @stefano-zanotti-88 with changes to the paland + unit tests etc.

@charlesnicholson charlesnicholson changed the title port pointer precision changes over, fix up / comment out unit tests tighter %p default precision May 26, 2025
@charlesnicholson
Copy link
Owner Author

@stefano-zanotti-88 I'm on the fence about the current implementation- one of the things that keeps npf (relatively) simpler to reason about is that parse_format_spec returns processed + compatible flags. Check out the unit tests I had to comment out to get the build farther along: https://github.com/charlesnicholson/nanoprintf/pull/340/files#diff-194066f3e4aa5818e4ed73fcaed61d0a3a2b1961032b1bad5a5da2bfb2bef230

Is it possible to rework this such that parse_format_spec only returns legal combinations of flags?

@stefano-zanotti-88
Copy link
Contributor

stefano-zanotti-88 commented May 26, 2025

About npf_parse_format_spec().
I think it should just parse the format syntactically, and not do any semantic analysis, ie not rejecting any string that is malformed due to incompatible options, and not suppressing any incompatible flag.
This is easier, and also much clearer to the user of the function; they can do their semantic analysis later.
Think about all the intricate cases, which are best left to the user of npf_parse_format_spec(): '0' with dynamic precision (cannot be determined by the function); '0' with literal precision (it can be); '#' with %i, c, s, ...; '+' and ' ' with unsigned specifiers; '0' with float specifiers; anything with %%; anything with %p; anything other than length modifiers with %n; ...

Also: is this function meant to be user-accessible? If so, then maybe you might consider integrating into it the parsing of spans of literal characters. I mean, instead of returning 0 when a non-% or an error is found, it could return the length of the span, with npf_format_spec_t.conv_spec set to NONE.
This is similar to what I did in #287 -- but I did this in the loop calling npf_parse_format_spec() instead of directly in the function. This would also make it easier for the caller to decide what to do with wrong specifiers, ie print all the chars, all except the leading %, or nothing. It would also be faster, because we can print multiple chars without multiple function calls (both because we can use puts instead of putc, and because we call npf_parse_format_spec just once instead of once per char).

A simple parsing without applying any semantic constraint is also useful for #298; otherwise, those safety checks don't have all the information needed for proper error detection (note that I'll have to rework #298 after the recent and future changes to NPF; some checks were indeed missing because of this reason).

Also see #341

This was referenced May 28, 2025
@charlesnicholson
Copy link
Owner Author

Might as well just delete the empty unit tests at this point :) Leaving the test cases around with all of the code commented out can be confusing to readers; if the claim is that the format spec can not be validated until star-args are read, and deleting the validation code is a core part of this PR, then would you mind also just fully deleting the tests that are half-commented-out please?

@stefano-zanotti-88
Copy link
Contributor

stefano-zanotti-88 commented May 29, 2025

This PR is authored by you, I don't think I can edit it.
Or, I could resurrect #329.

I would not delete the validation code. It just needs tweaks to check for all the options found in the format string, without any semantically-driven adjustment.
So, for instance:

REQUIRE(npf_parse_format_spec("%.4n", &spec) == 4);
REQUIRE(spec.prec_opt == NPF_FMT_SPEC_OPT_NONE);

would become:

REQUIRE(npf_parse_format_spec("%.4n", &spec) == 4);
REQUIRE(spec.conv_spec == NPF_FMT_SPEC_CONV_WRITEBACK);
REQUIRE(spec.prec_opt == NPF_FMT_SPEC_OPT_LITERAL);
REQUIRE(spec.prec == 4);

And this:

REQUIRE(npf_parse_format_spec("%0-u", &spec) == 4);
REQUIRE(spec.left_justified);
REQUIRE(!spec.leading_zero_pad);

would become:

REQUIRE(npf_parse_format_spec("%0-u", &spec) == 4);
REQUIRE(spec.left_justified);
REQUIRE(spec.leading_zero_pad);

Of course, the subcase names would need some tweaking too.

@charlesnicholson
Copy link
Owner Author

Ugh, I'm sorry, I forgot that we were discussing my PR 😬

Thanks for the feedback, these are good points and assumptions. I'll update the tests and then maybe you could take a final look at it and give me any thoughts. I do like this change!

@stefano-zanotti-88
Copy link
Contributor

Sounds great.

@charlesnicholson
Copy link
Owner Author

@stefano-zanotti-88 Looks like this PR regresses MSVC 64-bit. The failed run is here, could I ask you to take a look at it please? Everything else looks good at this point though!

https://github.com/charlesnicholson/nanoprintf/actions/runs/15614590541/job/43983976675?pr=340

===============================================================================
D:\a\nanoprintf\nanoprintf\tests\mpaland-conformance\paland.cc(763):
TEST CASE:  pointer

D:\a\nanoprintf\nanoprintf\tests\mpaland-conformance\paland.cc(99): MESSAGE: ABCDEF0123456789

D:\a\nanoprintf\nanoprintf\tests\mpaland-conformance\paland.cc(100): FATAL ERROR: REQUIRE( npf_result == std::string{expected} ) is NOT correct!
  values: REQUIRE( 23456789 == abcdef0123456789 )

@stefano-zanotti-88
Copy link
Contributor

It might be related to #342, especially if the failed test is with a compilation done without NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS.
You can try and add:

static_assert(sizeof(void*) <= sizeof(npf_uint_t));
static_assert(sizeof(uintptr_t*) <= sizeof(npf_uint_t));

=====
Otherwise:

I'm unable to replicate it (on MinGW).

Could you please add these tests in paland.cc:764?
I mean, in the "pointer" test case, outside of the if.

#if NANOPRINTF_USE_PRECISION_FORMAT_SPECIFIERS == 1
    require_conform("00", "%.2p", nullptr);
    require_conform("1234", "%.2p", (void *)0x1234u);
    require_conform("1234-5678", "%p-%p", (void *)0x1234u, (void *)0x5678u);
#endif

This is meant to check (on other platforms too) whether the problem is with NPF itself, mishandling the precision, or whether it is a problem with the argument itself, getting truncated.
Is it possible that MSVC is doing weird stuff with the size of void* and uintptr_t?
They should both be 64 bits.
These tests are good to add even if the problem is not here.

You could also try and add this somewhere:

static_assert(sizeof(void*) <= sizeof(uintptr_t));

Compilation should fail, if the problem is indeed with MSVC's definition of builtin types.

@charlesnicholson
Copy link
Owner Author

I added your new tests and they're failing on all platforms- check out the run here:

https://github.com/charlesnicholson/nanoprintf/actions/runs/15669412831?pr=340

===============================================================================
/__w/nanoprintf/nanoprintf/tests/mpaland-conformance/paland.cc:771:
TEST CASE:  pointer

/__w/nanoprintf/nanoprintf/tests/mpaland-conformance/paland.cc:775: FATAL ERROR: REQUIRE( std::string{buf} == std::string{"1234-5678"} ) is NOT correct!
  values: REQUIRE( 0000000000001234-0000000000005678 == 1234-5678 )

@stefano-zanotti-88
Copy link
Contributor

Sorry, the third test I gave you is wrong, it was meant to specify a precision too:

require_equal("1234-5678", "%.1p-%.3p", (void *)0x1234u, (void *)0x5678u);

Since the first two new tests succeed (on other platforms at least), I'm quite sure the problem is that of #342.
Please try and replace the following (in the PR):

nanoprintf/nanoprintf.h

Lines 295 to 301 in bd3f1f7

#if NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS == 0
typedef long npf_int_t;
typedef unsigned long npf_uint_t;
#else
typedef intmax_t npf_int_t;
typedef uintmax_t npf_uint_t;
#endif

with:

//#if NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS == 0
//  typedef long npf_int_t;
//  typedef unsigned long npf_uint_t;
//#else
  typedef intmax_t npf_int_t;
  typedef uintmax_t npf_uint_t;
//#endif

I think this will make the tests succeed; nothing else should break, since I don't think we have any test specifically targeting the condition long == npf_int.
Then, we can think about how to address #342 in a less drastic way.

@charlesnicholson
Copy link
Owner Author

I wasn't crazy about fully committing to uintmax and intmax because they can have huge performance penalties on smaller architectures, so I sat on this for a bit. I fixed the windows tests and AVR2/5 compilation tests with an alternate approach, please take a look- i'd love to hear your thoughts. Everything's green though, I'd like to land this!

@stefano-zanotti-88
Copy link
Contributor

I wouldn't want to always use (u)intmax_t either. That was just a temporary change to pinpoint the problem, and indeed it seems that npf_int was the root cause. Adapting to the smallest size necessary is what I'd like too.

Your latest commit causes no test failures, but I'm not sure it's correct.
Please see my comment on #342:

It might seem like it could be done more easily with intptr_t/uintptr_t instead of long, but on 8/16-bit architectures that would be wrong too: intptr_t could be 16 bits, and long 32 bits, so npf_int_t would be large enough for %p but not for %li

That is, if long specifiers are disabled in NPF, we have no long long, size_t, etc., so the widest type we need to deal with is either intptr or long. Both must be supported, since %p and %li are never disabled in NPF.
Usually, intptr is larger or equal, but on 8/16-bit architectures, intptr is usually 16 bits, and long is 32 bits (this is required by the standard: long width >= 4*CHAR_BIT -- and I'm assuming CHAR_BIT == 8)
I'm not sure why no test about long is failing in your AVR tests. I think some should be failing, but maybe we have a gap in the tests.
This should detect it:

#if ULONG_MAX >= 18446744073709551616ul
    require_conform("18446744073709551616", "%lu", 18446744073709551616ul);
#else // the standard ensures that width of long >= 4*CHAR_BIT >= 4*8
    require_conform("4294967296", "%lu", 4294967296ul);
#endif

Could you add it to the tests? I think this will fail on AVR, and the definition of npf_(u)int should be revised as a consequence. Again, see the proposed solution in #342
If you don't like my ifdef's, you can certainly come up with different ones, but they necessarily involve looking at the size of both intptr and long.

@charlesnicholson
Copy link
Owner Author

charlesnicholson commented Jul 27, 2025

I don't actually have an AVR platform to execute on, and haven't set up anything like qemu, so the extent of the AVR test is just "do things compile". Re: the sizes, that makes sense, we want max(sizeof(void*), sizeof(unsigned long))

(edit: deleted the sizeof stuff; you're right)

@charlesnicholson
Copy link
Owner Author

When it's at all possible, I prefer to probe the types using limits.h comparisons- I think this achieves what you're aiming at with the _WIN32 and __SIZEOF_POINTER__ etc approach; if you care to take another look I'd appreciate feedback. If not I'm happy merging this one as-is.

@stefano-zanotti-88
Copy link
Contributor

sizeof cannot be used in the preprocessor, unless by some compiler-specific extension, which is better avoided in a library like NPF.
sizeof is compile-time constant, but unknown to the preprocessor, which has no knowkedge of types (nor their sizes).

The macros in #342 came from some code that clang uses for a similar check, which I can't find anymore.
They do not use UINTPTR_WIDTH and similar, which require C23.
Also, they do not rely on explicit value checks like UINTPTR_MAX == 0xFFFFFFFFu, which rely on CHAR_BIT == 8 (not always true, though probably true for all the architectures NPF cares about) -- this explicit reliance is also there in your latest commit.

Actually, we don't need the explicit values of the size of long and intptr, so the following should suffice:

#if NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS == 0
  #if ULONG_MAX >= UINTPTR_MAX
    typedef long npf_int_t;
    typedef unsigned long npf_uint_t;
  #else
    typedef intptr_t npf_int_t;
    typedef uintptr_t npf_uint_t;
  #endif
#else
  typedef intmax_t npf_int_t;
  typedef uintmax_t npf_uint_t;
#endif

@charlesnicholson
Copy link
Owner Author

That tidies up to this:

#if NANOPRINTF_USE_LARGE_FORMAT_SPECIFIERS == 1
  typedef intmax_t npf_int_t;
  typedef uintmax_t npf_uint_t;
#elif ULONG_MAX > UINTPTR_MAX
  typedef long npf_int_t;
  typedef unsigned long npf_uint_t;
#else
  typedef intptr_t npf_int_t;
  typedef uintptr_t npf_uint_t;
#endif

which looks pretty nice to me. Thanks for collaborating on this one, I'm (finally!) going to land it.

@charlesnicholson charlesnicholson merged commit 9c25f79 into main Jul 27, 2025
28 checks passed
@charlesnicholson charlesnicholson deleted the pointer-precision-default branch July 27, 2025 16:08
@stefano-zanotti-88
Copy link
Contributor

Yes, this looks good.
You're welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants