Skip to content

Commit acf3871

Browse files
pcd1193182amotin
authored andcommitted
Correct weight recalculation of space-based metaslabs
Currently, after a failed allocation, the metaslab code recalculates the weight for a metaslab. However, for space-based metaslabs, it uses the maximum free segment size instead of the normal weighting algorithm. This is presumably because the normal metaslab weight is (roughly) intended to estimate the size of the largest free segment, but it doesn't do that reliably at most fragmentation levels. This means that recalculated metaslabs are forced to a weight that isn't really using the same units as the rest of them, resulting in undesirable behaviors. We switch this to use the normal space-weighting function. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Closes #17531
1 parent 21d5f25 commit acf3871

File tree

1 file changed

+7
-27
lines changed

1 file changed

+7
-27
lines changed

module/zfs/metaslab.c

Lines changed: 7 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -5073,29 +5073,16 @@ metaslab_group_alloc_normal(metaslab_group_t *mg, zio_alloc_list_t *zal,
50735073

50745074
/*
50755075
* We were unable to allocate from this metaslab so determine
5076-
* a new weight for this metaslab. Now that we have loaded
5077-
* the metaslab we can provide a better hint to the metaslab
5078-
* selector.
5079-
*
5080-
* For space-based metaslabs, we use the maximum block size.
5081-
* This information is only available when the metaslab
5082-
* is loaded and is more accurate than the generic free
5083-
* space weight that was calculated by metaslab_weight().
5084-
* This information allows us to quickly compare the maximum
5085-
* available allocation in the metaslab to the allocation
5086-
* size being requested.
5087-
*
5088-
* For segment-based metaslabs, determine the new weight
5089-
* based on the highest bucket in the range tree. We
5090-
* explicitly use the loaded segment weight (i.e. the range
5091-
* tree histogram) since it contains the space that is
5092-
* currently available for allocation and is accurate
5093-
* even within a sync pass.
5076+
* a new weight for this metaslab. The weight was last
5077+
* recalculated either when we loaded it (if this is the first
5078+
* TXG it's been loaded in), or the last time a txg was synced
5079+
* out.
50945080
*/
50955081
uint64_t weight;
50965082
if (WEIGHT_IS_SPACEBASED(msp->ms_weight)) {
5097-
weight = metaslab_largest_allocatable(msp);
5098-
WEIGHT_SET_SPACEBASED(weight);
5083+
metaslab_set_fragmentation(msp, B_TRUE);
5084+
weight = metaslab_space_weight(msp) &
5085+
~METASLAB_ACTIVE_MASK;
50995086
} else {
51005087
weight = metaslab_weight_from_range_tree(msp);
51015088
}
@@ -5107,13 +5094,6 @@ metaslab_group_alloc_normal(metaslab_group_t *mg, zio_alloc_list_t *zal,
51075094
* For the case where we use the metaslab that is
51085095
* active for another allocator we want to make
51095096
* sure that we retain the activation mask.
5110-
*
5111-
* Note that we could attempt to use something like
5112-
* metaslab_recalculate_weight_and_sort() that
5113-
* retains the activation mask here. That function
5114-
* uses metaslab_weight() to set the weight though
5115-
* which is not as accurate as the calculations
5116-
* above.
51175097
*/
51185098
weight |= msp->ms_weight & METASLAB_ACTIVE_MASK;
51195099
metaslab_group_sort(mg, msp, weight);

0 commit comments

Comments
 (0)