Skip to content

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jun 28, 2025

Summary

The motivation of ScopedExpressionId was that we have an expression identifier that's local to a scope and, therefore, unlikely to change if a user makes changes in another scope. A local identifier like this has the advantage that query results may remain unchanged even if other parts of the file change, which in turn allows Salsa to short-circuit dependent queries.

However, I noticed that we aren't using ScopedExpressionId in a place where it's important that the identifier is local. It's main use is inside infer which we always run for the entire file. The one exception to this is Unpack but unpack runs as part of infer.

Edit: The above isn't entirely correct. We used ScopedExpressionId in TypeInference which is a query result. Now using ExpressionNodeKey does mean that a change to the AST invalidates most if not all TypeInference results of a single file. Salsa then has to run all dependent queries to see if they're affected by this change even if the change was local to another scope.

If this locality proves to be important I suggest that we create two queries on top of TypeInference: one that returns the expression map which is mainly used in the linter and type inference and a second that returns all remaining fields. This should give us a similar optimization at a much lower cost

I also considered remove ScopedUseId but I believe that one is still useful because using ExpressionNodeKey for it instead would mean that all UseDefMap change when a single AST node changes. Whether this is important is something difficult to assess. I'm simply not familiar enough with the UseDefMap. If the locality doesn't matter for the UseDefMap, then a similar change could be made and bindings_by_use could be changed to an FxHashMap<UseId, Bindings> where UseId is a thin wrapper around NodeKey.

Closes astral-sh/ty#721

@MichaReiser MichaReiser added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jun 28, 2025
Copy link
Contributor

mypy_primer results

Changes were detected when running on open source projects
bidict (https://github.com/jab/bidict)
-     memo fields = ~17MB
+     memo fields = ~15MB

janus (https://github.com/aio-libs/janus)
- TOTAL MEMORY USAGE: ~28MB
+ TOTAL MEMORY USAGE: ~25MB

com2ann (https://github.com/ilevkivskyi/com2ann)
- TOTAL MEMORY USAGE: ~41MB
+ TOTAL MEMORY USAGE: ~37MB

pyp (https://github.com/hauntsaninja/pyp)
-     memo fields = ~41MB
+     memo fields = ~37MB

zipp (https://github.com/jaraco/zipp)
- TOTAL MEMORY USAGE: ~30MB
+ TOTAL MEMORY USAGE: ~28MB

pegen (https://github.com/we-like-parsers/pegen)
- TOTAL MEMORY USAGE: ~45MB
+ TOTAL MEMORY USAGE: ~41MB

beartype (https://github.com/beartype/beartype)
- TOTAL MEMORY USAGE: ~97MB
+ TOTAL MEMORY USAGE: ~88MB

attrs (https://github.com/python-attrs/attrs)
- TOTAL MEMORY USAGE: ~80MB
+ TOTAL MEMORY USAGE: ~72MB

aiortc (https://github.com/aiortc/aiortc)
- TOTAL MEMORY USAGE: ~97MB
+ TOTAL MEMORY USAGE: ~88MB

dedupe (https://github.com/dedupeio/dedupe)
-     memo fields = ~72MB
+     memo fields = ~66MB

graphql-core (https://github.com/graphql-python/graphql-core)
-     memo fields = ~129MB
+     memo fields = ~117MB

svcs (https://github.com/hynek/svcs)
- TOTAL MEMORY USAGE: ~80MB
+ TOTAL MEMORY USAGE: ~72MB

black (https://github.com/psf/black)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB

alerta (https://github.com/alerta/alerta)
-     memo fields = ~97MB
+     memo fields = ~88MB

downforeveryone (https://github.com/rpdelaney/downforeveryone)
-     memo fields = ~34MB
+     memo fields = ~30MB

rich (https://github.com/Textualize/rich)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB

PyGithub (https://github.com/PyGithub/PyGithub)
-     memo metadata = ~17MB
+     memo metadata = ~15MB

pydantic (https://github.com/pydantic/pydantic)
- TOTAL MEMORY USAGE: ~156MB
+ TOTAL MEMORY USAGE: ~142MB

mkosi (https://github.com/systemd/mkosi)
- TOTAL MEMORY USAGE: ~129MB
+ TOTAL MEMORY USAGE: ~117MB

imagehash (https://github.com/JohannesBuchner/imagehash)
- TOTAL MEMORY USAGE: ~49MB
+ TOTAL MEMORY USAGE: ~45MB

flake8 (https://github.com/pycqa/flake8)
-     memo fields = ~66MB
+     memo fields = ~60MB

pylox (https://github.com/sco1/pylox)
-     memo fields = ~54MB
+     memo fields = ~49MB

apprise (https://github.com/caronc/apprise)
-     memo fields = ~251MB
+     memo fields = ~228MB

typeshed-stats (https://github.com/AlexWaygood/typeshed-stats)
- TOTAL MEMORY USAGE: ~117MB
+ TOTAL MEMORY USAGE: ~106MB

alectryon (https://github.com/cpitclaudel/alectryon)
-     memo fields = ~13MB
+     memo fields = ~11MB

poetry (https://github.com/python-poetry/poetry)
-     memo metadata = ~23MB
+     memo metadata = ~21MB

bandersnatch (https://github.com/pypa/bandersnatch)
- TOTAL MEMORY USAGE: ~88MB
+ TOTAL MEMORY USAGE: ~80MB

python-sop (https://gitlab.com/dkg/python-sop)
-     memo fields = ~25MB
+     memo fields = ~23MB

boostedblob (https://github.com/hauntsaninja/boostedblob)
- TOTAL MEMORY USAGE: ~106MB
+ TOTAL MEMORY USAGE: ~97MB

freqtrade (https://github.com/freqtrade/freqtrade)
-     memo fields = ~304MB
+     memo fields = ~276MB

AutoSplit (https://github.com/Toufool/AutoSplit)
- TOTAL MEMORY USAGE: ~228MB
+ TOTAL MEMORY USAGE: ~207MB

pywin32 (https://github.com/mhammond/pywin32)
-     memo fields = ~368MB
+     memo fields = ~334MB

altair (https://github.com/vega/altair)
-     memo fields = ~251MB
+     memo fields = ~228MB

mitmproxy (https://github.com/mitmproxy/mitmproxy)
- TOTAL MEMORY USAGE: ~304MB
+ TOTAL MEMORY USAGE: ~276MB

pycryptodome (https://github.com/Legrandin/pycryptodome)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB
-     memo metadata = ~17MB
+     memo metadata = ~15MB

openlibrary (https://github.com/internetarchive/openlibrary)
-     memo fields = ~189MB
+     memo fields = ~171MB

manticore (https://github.com/trailofbits/manticore)
-     memo fields = ~539MB
+     memo fields = ~490MB

meson (https://github.com/mesonbuild/meson)
-     memo metadata = ~54MB
+     memo metadata = ~49MB
-     memo fields = ~334MB
+     memo fields = ~304MB

zulip (https://github.com/zulip/zulip)
-     memo fields = ~652MB
+     memo fields = ~593MB

scipy (https://github.com/scipy/scipy)
- TOTAL MEMORY USAGE: ~1271MB
+ TOTAL MEMORY USAGE: ~1156MB

sympy (https://github.com/sympy/sympy)
-     memo fields = ~1538MB
+     memo fields = ~1399MB

@charliermarsh
Copy link
Member

Woah those report diffs are amazing.

@MichaReiser
Copy link
Member Author

Woah those report diffs are amazing.

That's @ibraheemdev's work. He deserves a shoutout in tomorrow's meeting but Notion is having a bad day :(

Copy link

codspeed-hq bot commented Jun 28, 2025

CodSpeed Instrumentation Performance Report

Merging #19019 will improve performances by 4.59%

Comparing micha/ast-ids (d078951) with main (9218bf7)

Summary

⚡ 4 improvements
✅ 35 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
ty_check_file[cold] 124.1 ms 119 ms +4.29%
ty_micro[complex_constrained_attributes_2] 651.1 ms 624.9 ms +4.2%
anyio 899.3 ms 864.6 ms +4%
attrs 375.7 ms 359.2 ms +4.59%

Copy link

codspeed-hq bot commented Jun 28, 2025

CodSpeed WallTime Performance Report

Merging #19019 will improve performances by 7.79%

Comparing micha/ast-ids (d078951) with main (9218bf7)

Summary

⚡ 7 improvements
✅ 1 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
large[sympy] 53.5 s 50.9 s +5.2%
medium[colour-science] 8.7 s 8.3 s +5.21%
medium[pandas] 34.1 s 32.5 s +4.89%
multithreaded[pydantic] 8.6 s 8 s +7.79%
small[altair] 6.1 s 5.8 s +5.23%
small[freqtrade] 9.2 s 8.8 s +4.43%
small[tanjun] 4 s 3.9 s +4.12%

@MichaReiser MichaReiser changed the title [ty] Delete AstIds [ty] Remove HasScopedExpressionId Jun 28, 2025
@MichaReiser MichaReiser changed the title [ty] Remove HasScopedExpressionId [ty] Remove ScopedExpressionId Jun 28, 2025
@MichaReiser MichaReiser marked this pull request as ready for review June 28, 2025 20:09
@MichaReiser MichaReiser added performance Potential performance improvement and removed internal An internal refactor or improvement labels Jun 28, 2025
@MichaReiser
Copy link
Member Author

This PR makes me wonder if we should try to split our semantic index incrementally (scope by scope). This won't help much for first party modules where we need all scopes but it can reduce the amount of data that we collect for third party modules AND it could maybe even replaces exported_names because other files would then only depend on the index of the global scope

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great — less code and a clear performance win!

@sharkdp sharkdp merged commit 5f426b9 into main Jul 2, 2025
39 checks passed
@sharkdp sharkdp deleted the micha/ast-ids branch July 2, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider deleting AstIds
3 participants