WIP: make lxml (and singledispatch) optional #161

anthrotype · 2018-07-13T17:58:54Z

The reason I haven't tagged ufoLib 3.0 yet is because I was working on allowing to use the built-in xml.etree module when lxml is not installed.
The reason is, as much as I love to add third-party dependencies (:grin) and avoid to re-invent the wheel, I believe it makes sense to keep ufoLib working in a simple, pure-python setup, like it was before.
Especially since we are also planning to eventually merge ufoLib into fonttools, where third-party dependencies are even more difficult to add.

this PR adds a ufoLib.etree module which exports the same API as the (built-in) xml.etree.ElementTree or lxml.etree, and uses the latter whenever it is available otherwise falls back on the built-in module.

In order to produce the same output (with control over the order of the attributes, etc.), I needed to subclass Element and ElementTree objects and basically adapt the serialization part of the built-in ElementTree library.

The current tests pass, which is good. Of course, it is a bit slower than lxml (havent' run benchmarks yet), but the idea is to have it as a reasonable default or fallback, and lxml being the opt-in extra for those who want faster output.

I would like to add some more tests before merging this.

works with both lxml and xml.etree backends adds some missing things from built-in etree, such as the ability to use an OrderedDict for attributes, support for pretty_print argument to add indentation, etc.

too verbose

one can always do `pytest -v`, or through `tox -- -v`

this is conventional among python packages to have an 'testing' extras_require so developers can bootstrap a testing environment easily with something like: pip install -e .[testing] The pytest-cov is a pluging to integrate coverage.py with pytest. pytest-randomly is to shuffle tests in a random order to check against inter-test dependencies

…decov.io and remove 3.5, we don't need to test that one any more now that there's 3.7

and skip_branch_with_pr, to avoid running CI twice both on the PR branch and on merged master.

fixes https://ci.appveyor.com/project/adrientetar/ufolib/build/1.0.489/job/epryi911juu5lqdl#L72

https://ci.appveyor.com/project/adrientetar/ufolib/build/1.0.489/job/edm1m88yl4q5l36f

codecov-io · 2018-07-14T12:48:47Z

Codecov Report

❗ No coverage uploaded for pull request base (master@9735cdc). Click here to learn what that means.
The diff coverage is 66.56%.

@@            Coverage Diff            @@
##             master     #161   +/-   ##
=========================================
  Coverage          ?   88.73%           
=========================================
  Files             ?       20           
  Lines             ?    10157           
  Branches          ?     1106           
=========================================
  Hits              ?     9013           
  Misses            ?      773           
  Partials          ?      371

Impacted Files	Coverage Δ
Lib/ufoLib/glifLib.py	`74.51% <100%> (ø)`
Lib/ufoLib/plistlib.py	`98.03% <100%> (ø)`
Lib/ufoLib/test/test_plistlib.py	`100% <100%> (ø)`
Lib/ufoLib/etree.py	`60.22% <60.22%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9735cdc...d4f10f1. Read the comment docs.

https://www.appveyor.com/docs/build-configuration/#failing-strategy

…thon.exe https://nedbatchelder.com/blog/201509/appveyor.html The old TOXPYTHON trick seems to be discouraged now with tox 3.1, which emits a warning when there is a conflict in the basepython settings for environments containing default factors (e.g. py27, etc.) tox-dev/tox#841

like lxml does

…_literals

this had been broken for years, since commit 66e5ae0

miguelsousa · 2018-07-15T17:31:33Z

@anthrotype strings are still saved as data in layercontents.plist files. I think it's this line https://github.com/anthrotype/ufoLib/blob/optional-lxml/Lib/ufoLib/__init__.py#L1118

All of these string vs. data issues make me think that the tests have some glaring holes. What's your opinion?

anthrotype · 2018-07-15T17:43:40Z

tests have some glaring holes

Yeah, that’s why I’m adding coverage support in this pr among other things.
I’m about to take a flight now, I’ll take a look at that layercontents issue you pointed to. Thanks Miguel for checking

looks like defcon LayerSet.newLayer does not ensure the 'name' argument is a unicode string, so serializing the layercontents.plist with the new ufoLib.plistlib may lead to the layer name being encoded as a <data> element instead of a <string>. Thanks Miguel for noticing this.

miguelsousa · 2018-07-15T21:47:38Z

42a3428 fixes layerName (the UI string) but not self.layerContents[layerName] (the folder name).

anthrotype · 2018-07-15T23:22:01Z

Right.. will fix tomorrow

anthrotype · 2018-07-16T10:52:04Z

42a3428 fixes layerName (the UI string) but not self.layerContents[layerName] (the folder name).

hm, I don't think it's possible for the self.layerContents values (the directory name) to not be unicode, because the directory name is generated by the ufoLib.filenames.userNameToFileName function, which always returns unicode: it uses unicode_literals, asserts that input is unicode, etc.
Also, the ufoLib.plistlib.load will always return unicode strings for plist <string> elements, never bytes. So I don't think it's possible to introduce a bytes string in that dictionary. It was different for the layerName, the key in that dictionary, because the writeLayerContents function accepts an optional layerOrder list of strings, and the user may unwittingly pass bytes instead of unicodes.

@miguelsousa did you actually manage to reproduce the directory value in layerContents being a bytes string instead of unicode string?

anthrotype · 2018-07-16T11:04:24Z

btw @miguelsousa sorry I didn't make it in time for the pyup.io bot weekly update-all-the-repos.. Let's not rush this, and make sure we don't introduce any further regressions.

… notation [ci skip]

anthrotype · 2018-07-16T12:59:21Z

ok i'm gonna merge and release 2.3.0. I don't think this deserves 3.0.0. We are not breaking the API, the lxml dependency is optional, though recommended for better speeds.

typemytype · 2018-07-16T13:01:20Z

super, thanks!!

benkiel · 2018-07-16T14:26:30Z

Thanks! Does this then close #140?

miguelsousa · 2018-07-16T18:49:57Z

did you actually manage to reproduce the directory value in layerContents being a bytes string instead of unicode string?

yes. I patched mutatorMath locally, updated ufoLib to 42a3428, and ran makeinstancesufo_test.py from the afdko repo.

Let's not rush this, and make sure we don't introduce any further regressions.

agree. No rush on our end.

anthrotype added 20 commits July 13, 2018 18:44

ufoLib.etree: add shim module that exports ElementTree API

d87c4ed

works with both lxml and xml.etree backends adds some missing things from built-in etree, such as the ability to use an OrderedDict for attributes, support for pretty_print argument to add indentation, etc.

plistlib: make signledispatch optional; use ufoLib.etree

a99a6d6

glifLib: use ufoLib.etree

a9a3781

test_plistlib: use ufoLib.etree

59ac1aa

move lxml and singledispatch to separate extra_requirements.txt

92671c2

setup.py: add ufoLib[lxml] extras

ed2f42a

tox.ini: test with and without lxml

4d678ec

setup.cfg: adjust pytest config

c497295

Add .coveragerc file for coverage.py

f6b8849

Add htmlcov/ directory to .gitignore

5e1de2d

.coveragerc: don't show missing lines in coverage report in console

fc67456

too verbose

test_plistlib.py: mute harmless ResourceWarning on py27

d1fa34e

setup.cfg: don't run pytest in --verbose by default

8d52b1a

one can always do `pytest -v`, or through `tox -- -v`

tox.ini: test with/without lxml, add py37, coverage and more

9a94c29

tox.ini: fix building wheel from sdist

b2934f7

travis: test with/without lxml, add python 3.7; upload coverage to co…

a9bfbdf

…decov.io and remove 3.5, we don't need to test that one any more now that there's 3.7

appveyor: test w/o lxml, on py37 (x64 only); upload to codecov

8eb7245

and skip_branch_with_pr, to avoid running CI twice both on the PR branch and on merged master.

etree: fix valid XML regex for narrow UCS-2 only pythons

eea1766

fixes https://ci.appveyor.com/project/adrientetar/ufolib/build/1.0.489/job/epryi911juu5lqdl#L72

appveyor: no lxml wheels for py37 on windows; skip for now

d4f10f1

https://ci.appveyor.com/project/adrientetar/ufolib/build/1.0.489/job/edm1m88yl4q5l36f

anthrotype added 5 commits July 14, 2018 14:07

etree: simplify regex for invalid xml chars

5f61bcf

mute codecov on PRs, too noisy

fc6a7e4

appveyor: add fast_finish: true

37e4b32

https://www.appveyor.com/docs/build-configuration/#failing-strategy

etree: in invalid xml chars allow surrogates for 'narrow' pythons

1a94fbc

anthrotype force-pushed the optional-lxml branch from c96462f to 38a9b5f Compare July 14, 2018 14:18

tox: print python version and bitness for debugging

3ca6320

anthrotype force-pushed the optional-lxml branch from 38a9b5f to 3ca6320 Compare July 14, 2018 14:19

etree: only write self-closing <tag/> when element.text is None

83e8da2

like lxml does

ensure all modules do from __future__ import absolute_import, unicode…

0951e20

…_literals

anthrotype force-pushed the optional-lxml branch from 7aeffbb to 0951e20 Compare July 14, 2018 18:25

anthrotype added 6 commits July 14, 2018 19:42

validators: fixup doctests after adding unicode_literals

b44e3d5

aarg! fix writeDataFileAtomically not removing empty file...

abe74f1

this had been broken for years, since commit 66e5ae0

etree: restrict py2 bytes to ASCII only

6c6b94a

etree: pretty_print is False by default (like in lxml)

0c0d46e

test_etree: start adding some unit tests to etree module

bc7494b

test_etree: add test for pretty_print=True with ordered dict

e15d56a

README.md: mention alternative installation method with [lxml] extras…

eb03bfe

… notation [ci skip]

anthrotype merged commit 25c1060 into unified-font-object:master Jul 16, 2018

anthrotype deleted the optional-lxml branch July 16, 2018 12:59

This was referenced Jul 16, 2018

This isn’t practical in the real world #102

Closed

Catch invalid .plist files #141

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: make lxml (and singledispatch) optional #161

WIP: make lxml (and singledispatch) optional #161

Uh oh!

anthrotype commented Jul 13, 2018

Uh oh!

codecov-io commented Jul 14, 2018 •

edited

Loading

Uh oh!

miguelsousa commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 15, 2018

Uh oh!

miguelsousa commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

typemytype commented Jul 16, 2018

Uh oh!

benkiel commented Jul 16, 2018

Uh oh!

miguelsousa commented Jul 16, 2018

Uh oh!

Uh oh!

WIP: make lxml (and singledispatch) optional #161

WIP: make lxml (and singledispatch) optional #161

Uh oh!

Conversation

anthrotype commented Jul 13, 2018

Uh oh!

codecov-io commented Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

miguelsousa commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 15, 2018

Uh oh!

miguelsousa commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 15, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

anthrotype commented Jul 16, 2018

Uh oh!

typemytype commented Jul 16, 2018

Uh oh!

benkiel commented Jul 16, 2018

Uh oh!

miguelsousa commented Jul 16, 2018

Uh oh!

Uh oh!

codecov-io commented Jul 14, 2018 •

edited

Loading