Skip to content

Conversation

emilyyyylime
Copy link

Closes #44.

At long last I got around to implementing this. The code changes are a little all over the place, but all the tests pass and I've even added a new one to check the behaviour wrt loose name matching.

I've opted to implement this by following the UAX's example procedure:

An implementation of this loose matching rule can obtain the correct results when comparing two strings by doing the following three operations, in order:

  1. remove all medial hyphens (except the medial hyphen in the name for U+1180)
  2. remove all whitespace and underscore characters
  3. apply toLowercase() to both strings
    After applying these three operations, if the two strings compare binary equal, then they are considered to match.

and generating the name->codepoint PHF with the mapped ("normalised") names as keys.

This isn't a breaking change; in particular, any name that mapped to a character before will continue mapping to that same character.
However, it still might be worth it to mention this behaviour in the documentation for the crate or the character function.

@progval
Copy link
Owner

progval commented Jun 7, 2025

Thanks.

I'm not familiar with any of this, so this will take me a while to review. Other reviews are welcome

@emilyyyylime
Copy link
Author

I don't really know what to do about the build on non-nightly test it appears one of the build.rs dependencies is using an unstable feature? Might need to specify the crate version more precisely

@emilyyyylime
Copy link
Author

Apparently getopts depends on Unicode-width? #42 should fix this in that case

@emilyyyylime
Copy link
Author

wait nevermind this version doesn't even work

@emilyyyylime
Copy link
Author

Okay I believe now this is ready for approval. Merging #42 should make CI pass too

@emilyyyylime
Copy link
Author

I've also realised we don't really need MAX_NAME_LENGTH (before normalisation) anymore so I changed some code related to it

@emilyyyylime
Copy link
Author

Have you had time to look at the new version? Don't wanna pressure you, just making sure you didn't forget

@progval
Copy link
Owner

progval commented Jun 11, 2025

Sorry, I did forget.

One last thing: could you do this?

However, it still might be worth it to mention this behaviour in the documentation for the crate or the character function.

@emilyyyylime
Copy link
Author

I went a little extra polishing out some other docs around the crate ^^

@progval progval merged commit d6b2923 into progval:master Jun 15, 2025
10 checks passed
@progval
Copy link
Owner

progval commented Jun 19, 2025

Published in v2.0.0

I ended up ruling this a breaking change, due to the code generated by unicode_names2_generator after this change wouldn't build with existing versions of unicode_names2, and vice versa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conform to UAX#44
2 participants