Skip to content

Conversation

debarshiray
Copy link
Member

@debarshiray debarshiray commented May 30, 2025

XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  • The lock file to synchronize Podman migrations.

  • The initialization stamp file to synchronize the container's entry
    point with the user-facing enter and run commands running on the
    host operating system.

  • The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for normal use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1]. The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2]. eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775. That will cause dbus-daemon(1) to fail with:

  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container. These
don't need to be separated. However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS. The dbus-daemon(1)
implementation is used for the sake of simplicity. It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance. So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs. The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

@debarshiray debarshiray requested a review from Jmennius as a code owner May 30, 2025 23:21
debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 30, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 37bf2b2 to 3e7623b Compare May 30, 2025 23:46
debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 30, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch 2 times, most recently from 946f684 to 9d5974c Compare May 31, 2025 01:09
debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 31, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch 7 times, most recently from 2d66773 to 6e370af Compare May 31, 2025 11:06
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch 6 times, most recently from b0def84 to 4953ac1 Compare May 31, 2025 15:12
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 4953ac1 to 5eb3709 Compare May 31, 2025 17:02
debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 31, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2].  eg.,
Ubuntu 22.04 Desktop has a umask of 0002, and if an access mode is not
explicitly specified, XDG_RUNTIME_DIR will be created with 0775.  That
will cause dbus-daemon(1) to fail with:
  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 5eb3709 to 7149cb5 Compare May 31, 2025 21:57
debarshiray added a commit to debarshiray/toolbox that referenced this pull request May 31, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2].  eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775.  That will cause dbus-daemon(1) to fail with:
  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch 2 times, most recently from fdfb530 to 27ce2e5 Compare June 1, 2025 00:07
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 27ce2e5 to 92aa22b Compare June 1, 2025 13:22
@debarshiray
Copy link
Member Author

The test suite runs fine locally on my Fedora 42 Workstation. However, it runs into problems when running on my Ubuntu 22.04 virtual machine, and on these F41, F42, Rawhide and Ubuntu 22.04 CI hosts.

As far as I can see, the problem is that podman rm --force can't kill the containers' entry points with SIGTERM. I see the same problem with podman stop.

After tracking it down through the code, I found out that the cgroup-path is empty inside the $XDG_RUNTIME_DIR/crun/$CONTAINER_ID/status file. This is likely causing crun kill --all to silently fail with a successful exit code.

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jun 1, 2025
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Jsut like Flatpak, the singleton 'p11-kit server' process is not
terminated, when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jun 1, 2025
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Jsut like Flatpak, the singleton 'p11-kit server' process is not
terminated, when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed.  The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jun 1, 2025
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Jsut like Flatpak, the singleton 'p11-kit server' process is not
terminated, when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed.  The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jun 1, 2025
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Just like Flatpak, the singleton 'p11-kit server' process is not
terminated, when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed.  The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jun 2, 2025
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Just like Flatpak, the singleton 'p11-kit server' process is not
terminated when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed.  The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jul 4, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2].  eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775.  That will cause dbus-daemon(1) to fail with:
  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 92aa22b to 59bcef0 Compare July 4, 2025 19:44
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/containers/toolbox for 1652,59bcef01cb27023cc214ccd5f773cacca47fd1d1

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jul 4, 2025
XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2].  eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775.  That will cause dbus-daemon(1) to fail with:
  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 59bcef0 to 780a208 Compare July 4, 2025 19:52
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/containers/toolbox for 1652,780a208d8c357dfd8f4eeebc1000af71b7c734d9

XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.

First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:

  * The 'lock' file to synchronize Podman migrations.

  * The initialization stamp file to synchronize the container's entry
    point with the user-facing 'enter' and 'run' commands running on the
    host operating system.

  * The generated Container Device Interface specification.

These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.

Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1].  The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.

Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2].  eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775.  That will cause dbus-daemon(1) to fail with:
  Unable to set up transient service directory: XDG_RUNTIME_DIR
      "/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
      others (mode 040775)

Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container.  These
don't need to be separated.  However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.

Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS.  The dbus-daemon(1)
implementation is used for the sake of simplicity.  It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.

However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance.  So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs.  The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.

[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[2] https://specifications.freedesktop.org/basedir-spec/latest/

containers#1652
@debarshiray debarshiray force-pushed the wip/rishi/test-system-libs-helpers-isolate-xdg-runtime-dir branch from 780a208 to 1c616f0 Compare July 4, 2025 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant