-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[ty] Speedup project file discovery #19913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Diagnostic diff on typing conformance testsChanges were detected when running ty on typing conformance tests--- old-output.txt 2025-08-14 18:13:06.319161224 +0000
+++ new-output.txt 2025-08-14 18:13:06.387161856 +0000
@@ -1,5 +1,5 @@
WARN ty is pre-release software and not ready for production use. Expect to encounter bugs, missing features, and fatal errors.
-fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/918d35d/src/function/execute.rs:215:25 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(409cd)): execute: too many cycle iterations`
+fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/918d35d/src/function/execute.rs:215:25 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(ac97)): execute: too many cycle iterations`
_directives_deprecated_library.py:15:31: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `int`
_directives_deprecated_library.py:30:26: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `str`
_directives_deprecated_library.py:36:41: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `Self@__add__` |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This makes sense to me.
crates/ruff_db/src/files.rs
Outdated
/// | ||
/// This function differs from [`system_path_to_file`] in that it doesn't add a dependency | ||
/// on `File.status` and always returns a `File`, even | ||
/// if `path` doesn't point to a file (e.g. if `path` is a directory). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come? If metadata
lets you cheaply query the file type, can you still return a Result
here?
I mention this because it seems like we are otherwise careful not to allow a File
to be backed by a directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we could. I had a versio where I did just that.
I considered it to be fine because Files::try_system
also doesn't return a Result
but the file directly (even if the path maps to a directory). This where I consider the distinction between system_path_to_file
which makes sure it adds a query dependency on status (so that your query re-runs if the status changes) and protects you from getting a File
that points to a directory or doesn't exist.
Either way, I'm going to revert all of this thanks to @ibraheemdev's suggestion
crates/ty_project/src/walk.rs
Outdated
paths.push(entry.into_path()); | ||
// If this returns `Err`, then the file was deleted between now and when the walk callback was called. | ||
// We can ignore this. | ||
if let Ok(metadata) = system.path_metadata(entry.path()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why you can't use ignore::DirEntry::file_type
here, and I think it's because you also need the file permissions? Because directory entries (on Unix) will include the FileType
free of charge. But not permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about using the walk dir entry directly, but we, unfortunately, also need the permissions and, more importantly, the last modified time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also can't add a fork method to ty_project::Db that returns a cloned Self because the Db trait must be dyn compatible.
The usual workaround for this is to have a dyn_clone
method that returns a Box<dyn Db>
, but the solution here also seems perfectly fine.
Oh nice, that's exactly what I was looking for. |
* main: [ty] Add diagnostics for invalid `await` expressions (#19711) [ty] Synthesize read-only properties for all declared members on `NamedTuple` classes (#19899) [ty] Remove use of `ClassBase::try_from_type` from `super()` machinery (#19902) [ty] Speedup project file discovery (#19913) [`pyflakes`] Add secondary annotation showing previous definition (`F811`) (#19900) Bump 0.12.9 (#19917) [ty] support `kw_only=True` for `dataclass()` and `field()` (#19677)
Summary
The
ProjectFilesWalker
discovers all the files to check in a ty project by spawning multiple threads that walk the directory tree.The walker's output is a
Vec
orSet
of all theFile
s. The way this works today is that we first collect all file paths (multi threaded)into a single
Vec
, and the main thread then interns the paths to theirFile
's.The problem with this approach is that
system_path_to_file
is fairly expensive when called for many files because it does astat
call internally.The ideal solution would move the
system_path_to_file
call to the walker threads. That would also eliminate the sharedVec
that collects all paths. Unfortunately, this isn't possible because&dyn Db
isn'tSend
nor does&dyn Db
implementClone
(Salsa does have afork_db
feature but this isn't publicly exposed as of today). We also can't add afork
method toty_project::Db
that returns a clonedSelf
because theDb
trait must be dyn compatible.The solution I've taken here is to move the stat files into the walker threads, which fixes most of the waterfall effect and is much simpler than exposingfork_db
properly in salsa.I implemented the ideal solution thanks to @ibraheemdev's suggestion to add a
dyn_clone
!Fixes astral-sh/ty#970
Test Plan
Before

After