-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Introduce a "failed" value type #13930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
48d268b
to
6155a84
Compare
f635170
to
51cee6d
Compare
Things done:
|
I'm not so keen on adding retry-able thunks to this PR, since parallel eval was written under the assumption that thunks can only move to an evaluated state, never back to being a thunk. (See https://github.com/DeterminateSystems/nix-src/blob/3ccbe8465909bc37e222c8a2c2e047142252a083/src/libexpr/include/nix/expr/value.hh#L440-L457.) Making thunks retry-able would require a rethink not just with respect to how values are safely updated, but also the desired semantics (e.g. if a bunch of threads are waiting for a thunk that fails with a retry-able error, should those threads immediately retry?). |
The change away from the current retryable behavior is a real blocker for me. I've tested it and it sends NixOps4 into an infinite loop, as unfortunately expected. Is that assumption really required for correct operation? i.e. what part of the synchronization breaks? It might be ok for applications that need retrying to disable multithreading, although ideally of course we shouldn't have to work around it like that.
Depends on the granularity of blocking. If it's fine grained, we don't have to worry about work duplication as much, and the remaining failing bit of work is cheap.
Another thing that can help with performance is to save the function bodies as thunks into the Anyway, either choice seems workable for now. |
51cee6d
to
1ba3ad7
Compare
In the multithreaded evaluator, it's possible for multiple threads to wait on the same thunk. If evaluation of the thunk results in an exception, the waiting threads shouldn't try to re-force the thunk. Instead, they should rethrow the same exception, without duplicating any work. Therefore, there is now a new value type `tFailed` that stores an std::exception_ptr. If evaluation of a thunk/app results in an exception, `forceValue()` overwrites the value with a `tFailed`. If `forceValue()` encounters a `tFailed`, it rethrows the exception. So you normally never need to check for failed values (since forcing them causes a rethrow).
This uses `gc_cleanup` to call the exception_ptr destructor when the `Failed` is collected. I don't know exactly how bad it is to deallocate `std::exception_ptr` without destruction, but I guess it ranges from small leak to UB and crash.
This provides an explicit API for call-fail-retry-succeed evaluation flows, such as currently used in NixOps4. An alternative design would simply reset the `Value` to the original thunk instead of `tFailed` under the condition of catching a `RecoverableEvalError`. That is somewhat simpler, but I believe the presence of `tFailed` is beneficial for possible use in the repl; being able to show the error sooner, without re-evaluation. The hasPos method and isEmpty function are required in order to avoid an include loop.
1ba3ad7
to
110a9ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few suggestions to improve the output and/or keep it more stable.
I shouldn't approve my own code (the test and the recoverability that's required for NixOps4), so I'd appreciate review by @edolstra and/or others.
case nThunk: | ||
return WA("a", "thunk"); | ||
case nFailed: | ||
return WA("a", "failure"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term failed is useful and distinct internally, but normally, for users, we used to call them errors.
return WA("a", "failure"); | |
return WA("an", "error"); |
break; | ||
|
||
case nFailed: | ||
doc.writeEmptyElement("failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not break this code. The XML stuff is rather legacy anyway.
doc.writeEmptyElement("failed"); | |
// Historically, a tried and then ignored value (e.g. through tryEval) was | |
// reverted to the original thunk. | |
doc.writeEmptyElement("unevaluated"); |
} | ||
break; | ||
case nFailed: | ||
str << "«failed»"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str << "«failed»"; | |
// Historically, a tried and then ignored value (e.g. through tryEval) was | |
// reverted to the original thunk. | |
str << "<CODE>"; |
|
||
void printFailed(Value & v) | ||
{ | ||
output << "«failed»"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where this ends up.
tBool, | ||
tNull, | ||
tFloat, | ||
tFailed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xokdvium does this look ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: this could be a thunk instead, pointing to a pseudo-Expr
that throws the error.
I don't think that change is required, but it would free up this scarce InternalType
entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This messes up the traces when there's reuse.
Reuse can occur when
- evaluating the same binding in
nix repl
- referencing the same thing after
builtins.tryEval
- evaluating multiple attributes in the same application, with recovery, like
nix flake show
, Hydra
Example, nix repl --show-trace
:
nix-repl> builtins.toJSON { foo = nixosConfigurations.feb.config.services.postgresql.dataDir; }
error:
… while calling the 'toJSON' builtin
at «string»:1:1:
1| builtins.toJSON { foo = nixosConfigurations.feb.config.services.postgresql.dataDir; }
| ^
[...]
error: The option `nixos.configurations.feb.services.postgresql.dataDir' was accessed but has no value defined. Try setting the option.
nix-repl> nixosConfigurations.feb.config.services.postgresql.dataDir
error:
… while calling the 'toJSON' builtin
at «string»:1:1:
1| builtins.toJSON { foo = nixosConfigurations.feb.config.services.postgresql.dataDir; }
| ^
Note that the second nix repl
prompt did not have the toJSON
call. Instead the old trace was stored and mutated to add context that does not apply anymore.
We'll need the exception system to stop mutating traces. A singly linked immutable list of trace items seems most suitable.
Motivation
In the multithreaded evaluator, it's possible for multiple threads to wait on the same thunk. If evaluation of the thunk results in an exception, the waiting threads shouldn't try to re-force the thunk. Instead, they should rethrow the same exception, without duplicating any work.
Therefore, there is now a new value type
tFailed
that stores an std::exception_ptr. If evaluation of a thunk/app results in an exception,forceValue()
overwrites the value with atFailed
. IfforceValue()
encounters atFailed
, it rethrows the exception. So you normally never need to check for failed values (since forcing them causes a rethrow).Context
Extracted from DeterminateSystems#125.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.