[ty] Remove `ScopedExpressionId` #19019

MichaReiser · 2025-06-28T19:37:43Z

Summary

The motivation of ScopedExpressionId was that we have an expression identifier that's local to a scope and, therefore, unlikely to change if a user makes changes in another scope. A local identifier like this has the advantage that query results may remain unchanged even if other parts of the file change, which in turn allows Salsa to short-circuit dependent queries.

However, I noticed that we aren't using ScopedExpressionId in a place where it's important that the identifier is local. It's main use is inside infer which we always run for the entire file. The one exception to this is Unpack but unpack runs as part of infer.

Edit: The above isn't entirely correct. We used ScopedExpressionId in TypeInference which is a query result. Now using ExpressionNodeKey does mean that a change to the AST invalidates most if not all TypeInference results of a single file. Salsa then has to run all dependent queries to see if they're affected by this change even if the change was local to another scope.

If this locality proves to be important I suggest that we create two queries on top of TypeInference: one that returns the expression map which is mainly used in the linter and type inference and a second that returns all remaining fields. This should give us a similar optimization at a much lower cost

I also considered remove ScopedUseId but I believe that one is still useful because using ExpressionNodeKey for it instead would mean that all UseDefMap change when a single AST node changes. Whether this is important is something difficult to assess. I'm simply not familiar enough with the UseDefMap. If the locality doesn't matter for the UseDefMap, then a similar change could be made and bindings_by_use could be changed to an FxHashMap<UseId, Bindings> where UseId is a thin wrapper around NodeKey.

Closes astral-sh/ty#721

github-actions · 2025-06-28T19:41:14Z

`mypy_primer` results

Changes were detected when running on open source projects

bidict (https://github.com/jab/bidict)
-     memo fields = ~17MB
+     memo fields = ~15MB

janus (https://github.com/aio-libs/janus)
- TOTAL MEMORY USAGE: ~28MB
+ TOTAL MEMORY USAGE: ~25MB

com2ann (https://github.com/ilevkivskyi/com2ann)
- TOTAL MEMORY USAGE: ~41MB
+ TOTAL MEMORY USAGE: ~37MB

pyp (https://github.com/hauntsaninja/pyp)
-     memo fields = ~41MB
+     memo fields = ~37MB

zipp (https://github.com/jaraco/zipp)
- TOTAL MEMORY USAGE: ~30MB
+ TOTAL MEMORY USAGE: ~28MB

pegen (https://github.com/we-like-parsers/pegen)
- TOTAL MEMORY USAGE: ~45MB
+ TOTAL MEMORY USAGE: ~41MB

beartype (https://github.com/beartype/beartype)
- TOTAL MEMORY USAGE: ~97MB
+ TOTAL MEMORY USAGE: ~88MB

attrs (https://github.com/python-attrs/attrs)
- TOTAL MEMORY USAGE: ~80MB
+ TOTAL MEMORY USAGE: ~72MB

aiortc (https://github.com/aiortc/aiortc)
- TOTAL MEMORY USAGE: ~97MB
+ TOTAL MEMORY USAGE: ~88MB

dedupe (https://github.com/dedupeio/dedupe)
-     memo fields = ~72MB
+     memo fields = ~66MB

graphql-core (https://github.com/graphql-python/graphql-core)
-     memo fields = ~129MB
+     memo fields = ~117MB

svcs (https://github.com/hynek/svcs)
- TOTAL MEMORY USAGE: ~80MB
+ TOTAL MEMORY USAGE: ~72MB

black (https://github.com/psf/black)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB

alerta (https://github.com/alerta/alerta)
-     memo fields = ~97MB
+     memo fields = ~88MB

downforeveryone (https://github.com/rpdelaney/downforeveryone)
-     memo fields = ~34MB
+     memo fields = ~30MB

rich (https://github.com/Textualize/rich)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB

PyGithub (https://github.com/PyGithub/PyGithub)
-     memo metadata = ~17MB
+     memo metadata = ~15MB

pydantic (https://github.com/pydantic/pydantic)
- TOTAL MEMORY USAGE: ~156MB
+ TOTAL MEMORY USAGE: ~142MB

mkosi (https://github.com/systemd/mkosi)
- TOTAL MEMORY USAGE: ~129MB
+ TOTAL MEMORY USAGE: ~117MB

imagehash (https://github.com/JohannesBuchner/imagehash)
- TOTAL MEMORY USAGE: ~49MB
+ TOTAL MEMORY USAGE: ~45MB

flake8 (https://github.com/pycqa/flake8)
-     memo fields = ~66MB
+     memo fields = ~60MB

pylox (https://github.com/sco1/pylox)
-     memo fields = ~54MB
+     memo fields = ~49MB

apprise (https://github.com/caronc/apprise)
-     memo fields = ~251MB
+     memo fields = ~228MB

typeshed-stats (https://github.com/AlexWaygood/typeshed-stats)
- TOTAL MEMORY USAGE: ~117MB
+ TOTAL MEMORY USAGE: ~106MB

alectryon (https://github.com/cpitclaudel/alectryon)
-     memo fields = ~13MB
+     memo fields = ~11MB

poetry (https://github.com/python-poetry/poetry)
-     memo metadata = ~23MB
+     memo metadata = ~21MB

bandersnatch (https://github.com/pypa/bandersnatch)
- TOTAL MEMORY USAGE: ~88MB
+ TOTAL MEMORY USAGE: ~80MB

python-sop (https://gitlab.com/dkg/python-sop)
-     memo fields = ~25MB
+     memo fields = ~23MB

boostedblob (https://github.com/hauntsaninja/boostedblob)
- TOTAL MEMORY USAGE: ~106MB
+ TOTAL MEMORY USAGE: ~97MB

freqtrade (https://github.com/freqtrade/freqtrade)
-     memo fields = ~304MB
+     memo fields = ~276MB

AutoSplit (https://github.com/Toufool/AutoSplit)
- TOTAL MEMORY USAGE: ~228MB
+ TOTAL MEMORY USAGE: ~207MB

pywin32 (https://github.com/mhammond/pywin32)
-     memo fields = ~368MB
+     memo fields = ~334MB

altair (https://github.com/vega/altair)
-     memo fields = ~251MB
+     memo fields = ~228MB

mitmproxy (https://github.com/mitmproxy/mitmproxy)
- TOTAL MEMORY USAGE: ~304MB
+ TOTAL MEMORY USAGE: ~276MB

pycryptodome (https://github.com/Legrandin/pycryptodome)
- TOTAL MEMORY USAGE: ~142MB
+ TOTAL MEMORY USAGE: ~129MB
-     memo metadata = ~17MB
+     memo metadata = ~15MB

openlibrary (https://github.com/internetarchive/openlibrary)
-     memo fields = ~189MB
+     memo fields = ~171MB

manticore (https://github.com/trailofbits/manticore)
-     memo fields = ~539MB
+     memo fields = ~490MB

meson (https://github.com/mesonbuild/meson)
-     memo metadata = ~54MB
+     memo metadata = ~49MB
-     memo fields = ~334MB
+     memo fields = ~304MB

zulip (https://github.com/zulip/zulip)
-     memo fields = ~652MB
+     memo fields = ~593MB

scipy (https://github.com/scipy/scipy)
- TOTAL MEMORY USAGE: ~1271MB
+ TOTAL MEMORY USAGE: ~1156MB

sympy (https://github.com/sympy/sympy)
-     memo fields = ~1538MB
+     memo fields = ~1399MB

charliermarsh · 2025-06-28T19:42:08Z

Woah those report diffs are amazing.

MichaReiser · 2025-06-28T19:44:39Z

Woah those report diffs are amazing.

That's @ibraheemdev's work. He deserves a shoutout in tomorrow's meeting but Notion is having a bad day :(

codspeed-hq · 2025-06-28T19:47:27Z

CodSpeed Instrumentation Performance Report

Merging #19019 will improve performances by 4.59%

_{Comparing micha/ast-ids (d078951) with main (9218bf7)}

Summary

⚡ 4 improvements
✅ 35 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`ty_check_file[cold]`	124.1 ms	119 ms	+4.29%
⚡	`ty_micro[complex_constrained_attributes_2]`	651.1 ms	624.9 ms	+4.2%
⚡	`anyio`	899.3 ms	864.6 ms	+4%
⚡	`attrs`	375.7 ms	359.2 ms	+4.59%

codspeed-hq · 2025-06-28T19:49:13Z

CodSpeed WallTime Performance Report

Merging #19019 will improve performances by 7.79%

_{Comparing micha/ast-ids (d078951) with main (9218bf7)}

Summary

⚡ 7 improvements
✅ 1 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`large[sympy]`	53.5 s	50.9 s	+5.2%
⚡	`medium[colour-science]`	8.7 s	8.3 s	+5.21%
⚡	`medium[pandas]`	34.1 s	32.5 s	+4.89%
⚡	`multithreaded[pydantic]`	8.6 s	8 s	+7.79%
⚡	`small[altair]`	6.1 s	5.8 s	+5.23%
⚡	`small[freqtrade]`	9.2 s	8.8 s	+4.43%
⚡	`small[tanjun]`	4 s	3.9 s	+4.12%

MichaReiser · 2025-06-28T20:33:23Z

This PR makes me wonder if we should try to split our semantic index incrementally (scope by scope). This won't help much for first party modules where we need all scopes but it can reduce the amount of data that we collect for third party modules AND it could maybe even replaces exported_names because other files would then only depend on the index of the global scope

sharkdp

This is great — less code and a clear performance win!

Replace ScopedExpressionId with ExpressionNodeKey

d078951

MichaReiser added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jun 28, 2025

MichaReiser changed the title ~~[ty] Delete AstIds~~ [ty] Remove HasScopedExpressionId Jun 28, 2025

MichaReiser changed the title ~~[ty] Remove HasScopedExpressionId~~ [ty] Remove ScopedExpressionId Jun 28, 2025

MichaReiser marked this pull request as ready for review June 28, 2025 20:09

MichaReiser requested review from carljm, AlexWaygood, sharkdp and dcreager as code owners June 28, 2025 20:09

MichaReiser added performance Potential performance improvement and removed internal An internal refactor or improvement labels Jun 28, 2025

MichaReiser mentioned this pull request Jun 28, 2025

More lazy semantic index building astral-sh/ty#722

Open

AlexWaygood removed their request for review June 29, 2025 17:13

sharkdp approved these changes Jul 2, 2025

View reviewed changes

sharkdp merged commit 5f426b9 into main Jul 2, 2025
39 checks passed

sharkdp deleted the micha/ast-ids branch July 2, 2025 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ty] Remove `ScopedExpressionId` #19019

[ty] Remove `ScopedExpressionId` #19019

Uh oh!

MichaReiser commented Jun 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

charliermarsh commented Jun 28, 2025

Uh oh!

MichaReiser commented Jun 28, 2025

Uh oh!

codspeed-hq bot commented Jun 28, 2025

Uh oh!

codspeed-hq bot commented Jun 28, 2025

Uh oh!

MichaReiser commented Jun 28, 2025

Uh oh!

sharkdp left a comment

Uh oh!

Uh oh!

Uh oh!

[ty] Remove ScopedExpressionId #19019

[ty] Remove ScopedExpressionId #19019

Uh oh!

Conversation

MichaReiser commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Jun 28, 2025

mypy_primer results

Uh oh!

charliermarsh commented Jun 28, 2025

Uh oh!

MichaReiser commented Jun 28, 2025

Uh oh!

codspeed-hq bot commented Jun 28, 2025

CodSpeed Instrumentation Performance Report

Merging #19019 will improve performances by 4.59%

Summary

Benchmarks breakdown

Uh oh!

codspeed-hq bot commented Jun 28, 2025

CodSpeed WallTime Performance Report

Merging #19019 will improve performances by 7.79%

Summary

Benchmarks breakdown

Uh oh!

MichaReiser commented Jun 28, 2025

Uh oh!

sharkdp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[ty] Remove `ScopedExpressionId` #19019

[ty] Remove `ScopedExpressionId` #19019

MichaReiser commented Jun 28, 2025 •

edited

Loading

`mypy_primer` results