Skip to content

Conversation

hsjobeki
Copy link
Contributor

@hsjobeki hsjobeki commented Sep 14, 2025

Implements concatMapAttrs according to #11887

Motivation

This should allow efficient usage of concatMapAttrs which is currently implemented in nixpkgs/lib.
However due to heavy performance overhead (n update // operations) has never seen great adoption.

Adding this function to primops, may unlock faster mapping between attribute sets, which currently always require an intermediate list.

attrsToList -> map -> listToAttrs

Context

see #11887

  • Needs tests
  • Needs clarify whether we should use std::map, or custom deduplication.

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Example Ideas:

filterAttrsRecursive could be implemented as:

  filterAttrsRecursive = pred: set: builtins.concatMapAttrs (
    n: v:
      if pred n v then
        if builtins.isAttrs v then
          { ${n} = filterAttrsRecursive pred v; }
        else
          { ${n} = v; }
      else
        { }
  ) set;

Which saves the internal list conversion and concatenation.
https://github.com/NixOS/nixpkgs/blob/26833ad1dad83826ef7cc52e0009ca9b7097c79f/lib/attrsets.nix#L701:C3

There are many other functions in nixpkgs.lib that use an internal list conversion (attrsToList -> map -> listToAttrs) that could benefit in performance and memory allocation.

@hsjobeki hsjobeki requested a review from roberth September 14, 2025 09:32
@hsjobeki hsjobeki changed the title WIP: primops: init concatMapAttrs Primops: init concatMapAttrs Sep 14, 2025
@hsjobeki hsjobeki marked this pull request as ready for review September 14, 2025 09:33
@xokdvium
Copy link
Contributor

However due to heavy performance overhead (n update // operations)

FWIW It might be also productive to work on reducing the overhead/allocations for attrset merges. Do you have a particular benchmark in mind which shows this overhead?

@hsjobeki
Copy link
Contributor Author

hsjobeki commented Sep 14, 2025

I have a simple eval test that i used for comparison:

let
  lib = import <nixpkgs/lib>;

  many = {
    a = "c"; # first
    b = "B";
    c = "a"; # last wins
    d = "D";
    e = "E";
    f = "F";
    g = "F";
    h = "H";
    i = "I";
    j = "A";
  };

  mult = (n: v: { "${n}-${v}" = v; "${v}-${n}" = v; });
in
builtins.deepSeq lib.concatMapAttrs {
  a = lib.concatMapAttrs mult many;
  b = builtins.concatMapAttrs mult many;
  const = null; # eval to determine the static offset
}

Result

envs:
  bytes: 1544 → 320 (-1224)
  elements: 138 → 20 (-118)
  number: 55 → 20 (-35)
gc:
  totalBytes: 10784 → 1280 (-9504)
list:
  bytes: 80 → 0 (-80)
  elements: 10 → 0 (-10)
nrAvoided: 51 → 32 (-19)
nrExprs: 1563 → 0 (-1563)
nrFunctionCalls: 48 → 20 (-28)
nrLookups: 9 → 1 (-8)
nrOpUpdateValuesCopied: 92 → 0 (-92)
nrOpUpdates: 10 → 0 (-10)
nrPrimOpCalls: 6 → 1 (-5)
nrThunks: 150 → 0 (-150)
sets:
  bytes: 4344 → 864 (-3480)
  elements: 259 → 48 (-211)
  number: 25 → 12 (-13)
symbols:
  bytes: 509 → 54 (-455)
  number: 94 → 18 (-76)

How i got these:

With evaluating a (lib)

{
  "cpuTime": 0.01888599991798401,
  "envs": {
    "bytes": 2576,
    "elements": 254,
    "number": 68
  },
  "gc": {
    "cycles": 1,
    "heapSize": 402915328,
    "totalBytes": 49008
  },
  "list": {
    "bytes": 88,
    "concats": 0,
    "elements": 11
  },
  "nrAvoided": 61,
  "nrExprs": 3277,
  "nrFunctionCalls": 52,
  "nrLookups": 13,
  "nrOpUpdateValuesCopied": 552,
  "nrOpUpdates": 11,
  "nrPrimOpCalls": 10,
  "nrThunks": 726,
  "sets": {
    "bytes": 22232,
    "elements": 1373,
    "number": 33
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 8,
    "Env": 8,
    "Value": 16
  },
  "symbols": {
    "bytes": 8276,
    "number": 842
  },
  "time": {
    "cpu": 0.01888599991798401,
    "gc": 0.0,
    "gcFraction": 0.0
  },
  "values": {
    "bytes": 14080,
    "number": 880
  }
}

With evaluating b (primop)

{
  "cpuTime": 0.03371499851346016,
  "envs": {
    "bytes": 1352,
    "elements": 136,
    "number": 33
  },
  "gc": {
    "cycles": 1,
    "heapSize": 402915328,
    "totalBytes": 39504
  },
  "list": {
    "bytes": 8,
    "concats": 0,
    "elements": 1
  },
  "nrAvoided": 42,
  "nrExprs": 1714,
  "nrFunctionCalls": 24,
  "nrLookups": 5,
  "nrOpUpdateValuesCopied": 460,
  "nrOpUpdates": 1,
  "nrPrimOpCalls": 5,
  "nrThunks": 576,
  "sets": {
    "bytes": 18752,
    "elements": 1162,
    "number": 20
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 8,
    "Env": 8,
    "Value": 16
  },
  "symbols": {
    "bytes": 7821,
    "number": 766
  },
  "time": {
    "cpu": 0.03371499851346016,
    "gc": 0.0,
    "gcFraction": 0.0
  },
  "values": {
    "bytes": 11504,
    "number": 719
  }
}

Offset, determined via evaluating const

{
  "cpuTime": 0.01800299994647503,
  "envs": {
    "bytes": 1032,
    "elements": 116,
    "number": 13
  },
  "gc": {
    "cycles": 1,
    "heapSize": 402915328,
    "totalBytes": 38224
  },
  "list": {
    "bytes": 8,
    "concats": 0,
    "elements": 1
  },
  "nrAvoided": 10,
  "nrExprs": 1714,
  "nrFunctionCalls": 4,
  "nrLookups": 4,
  "nrOpUpdateValuesCopied": 460,
  "nrOpUpdates": 1,
  "nrPrimOpCalls": 4,
  "nrThunks": 576,
  "sets": {
    "bytes": 17888,
    "elements": 1114,
    "number": 8
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 8,
    "Env": 8,
    "Value": 16
  },
  "symbols": {
    "bytes": 7767,
    "number": 748
  },
  "time": {
    "cpu": 0.01800299994647503,
    "gc": 0.0,
    "gcFraction": 0.0
  },
  "values": {
    "bytes": 11184,
    "number": 699
  }
}

@hsjobeki
Copy link
Contributor Author

hsjobeki commented Sep 18, 2025

@roberth Not quite sure how to benchmark this.
a plain nixos evaluation didn't show any differences. 0.00% in thunks, -20 total, other stats similar. I think this is because we avoided the use of concatMapAttrs in the past.
concatMapAttrs got called 5 times during a nixos evaluation.
So the synthetic test with 10 attributes seems to yield a bigger benefit than running a nixos eval.

I am also questioning if using std:map is the right approach?
In contrast using the Attrs as a plain vector, needs to be linearly scanned for deduplication of conflicts?

"while evaluating the second argument passed to builtins.concatMapAttrs");
auto inAttrs = args[1]->attrs();

std::map<SymbolIdx, Value*> attrMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must use traceable_allocator to make those allocations visible to the GC I think. Otherwise they are invisible to Boehm and those values can be freed.

@xokdvium
Copy link
Contributor

heavy performance overhead (n update // operations)

This actually should be much less of a concern after #13987.

Comment on lines +3414 to +3415
Value * result = state.allocValue();
result->mkApp(funApp, i.value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of allocating the result on the heap? It could just be a variable on the stack, no?

Comment on lines +3427 to +3429
for (const auto& [name, value] : attrMap) {
attrs.insert(name, value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ranges::copy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large inputs, a k-way merge would make more sense.
The pkgs/by-name attrset, if* implemented with concatMapAttrs, would become a mere linear time operation; possibly faster than the current in-expression binary merge "tree" algorithm.

*: Nixpkgs could decide based on builtins?concatMapAttrs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can expose the Bindings::iterator as a standalone k-way merge utility this is pretty easy to implement. We can resurrect your old k-way merge branch for this @roberth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if* implemented with concatMapAttrs

Wouldn't this also imply some overhead for the function calls? We could maybe implement an optimization for identity functions though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resurrect [...] @roberth

https://github.com/NixOS/nix/pull/11290/files#diff-f118e4c6f6e02148b887fdf627352311fca5a3a4eadf0b4a9d9f348e0be464ffR1904

imply some overhead for the function calls?

I think it's naturally a concatMapAttrs, something like

concatMapAttrs
  (shard: type:
    lib.optionalAttrs (type == "directory") (
      readShard shard
    )
  (readDir ../by-name)

No identity function needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe implement an optimization for identity functions though

Might be useful for other cases, idk. Not the most promising, but tracking in

"while evaluating the second argument passed to builtins.concatMapAttrs");
auto inAttrs = args[1]->attrs();

std::map<SymbolIdx, Value*> attrMap;
Copy link
Contributor

@xokdvium xokdvium Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to use a small buffer optimization to avoid unnecessary allocations. As it stands now this will be very costly actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants