-
Notifications
You must be signed in to change notification settings - Fork 1.9k
linux/zvol_os: fix crash with blk-mq on Linux 4.19 #17597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was quite a puzzle for years, but I just wonder: couldn't we check rq->cpu
for the mentioned -1
value, if it is such when not initialized? Can there be some legal cases when rq->cpu
is intentionally not equal to get_cpu()
? Though the effects should only be limited to performance, so I don't insist for so old versions.
@amotin I did think about making sure that (Reading that back, I kinda sound like an asshole maybe. I dunno, I feel like there's an indicator of some other problem here, but maybe its just that Linux is hard to track, so more of the same). |
@robn I agree that checking for validity would be an overkill. But you mentioned specific value of BTW, should here be |
@amotin Fair. I only saw initialisation to I'd be fine with Maybe different/better question: ultimately, we're hunting down |
@robn The goal of this logic is to keep requests that may be sequential together to reduce reorder between them to not confuse speculative prefetcher too much, but same time get as even as possible load distribution between multiple taskqueues. The assumption is that requests coming through different queues may already be reordered, so we don't care much. Same time even within one queue we may have multiple unrelated sequential streams, that we can try to roughly split offset. Since my last year round there speculative prefetcher should now be able to handle some depth of reorder, not being as bad as it was before, but it is still not infinite. |
As I see, |
Ah yeah, I agree |
Updated to use |
03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#17597
03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#17597
03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#17597
03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#17597 (cherry picked from commit 3aa21bd)
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
03987f7 (#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the
struct request
. However, this workaround appears to be incomplete.In 4.19, the rq data context is optional. If its not initialised, then the cached
rq->cpu
will be -1, and so using it to index into mq_map causes a crash.Description
Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call
get_cpu()
to get a valid CPU id and use that instead.How Has This Been Tested?
Since 03987f7,
zvol_stress
against 4.19.325 will crash with a bad memory access within a few seconds. With this change, the entirezvol
test tag runs to successful completion.Compiled checked against 5.4.294, which is the oldest kernel I have lying around ahead of 4.19. It defines
HAVE_BLK_MQ_RQ_HCTX
, so it would not hit this codepath.Types of changes
Checklist:
Signed-off-by
.