Skip to content

Conversation

debarshiray
Copy link
Member

@debarshiray debarshiray commented May 7, 2025

This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system. Any program that uses PKCS #⁠11 to access CA certificates should
see the same ones both inside the container and on the host.

During every enter and run command, toolbox(1) ensures that an
instance of p11-kit server is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Just like Flatpak, the singleton p11-kit server process is not
terminated when the last enter or run command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS #⁠11 module instead of the usual p11-kit-trust.so
module. This talks to the p11-kit server instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as sudo toolbox ... there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if p11-kit server can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS #⁠11 modules or
p11-kit-client.so are missing inside the container. None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have p11-kit server, and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so. Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without p11-kit server because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2]. To terminate the p11-kit server instance run by the
system tests, it needs to be distinguishable from the instance run by
normal use of Toolbx by the user. One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3]. So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides p11-kit server, because it leads to:

  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the server command is removed. The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
flatpak/flatpak@66b2ff40f7caf3a7
flatpak/flatpak#1757
p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] #1652

#626

Copy link

func getP11KitClientPathsUbuntu() []string {
paths := []string{
"/usr/lib/aarch64-linux-gnu/pkcs11/p11-kit-client.so",
"/usr/lib/x86_64-linux-gnu/pkcs11/p11-kit-client.so",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jmennius I hope I got the path to p11-kit-client.so right for Ubuntu. I suppose we build the ubuntu-toolbox images only for aarch64 and x86_64, correct?

Copy link

@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch 4 times, most recently from 7e513a4 to 33da883 Compare May 10, 2025 16:02
@debarshiray debarshiray marked this pull request as ready for review May 10, 2025 16:03
@debarshiray debarshiray requested a review from Jmennius as a code owner May 10, 2025 16:03
@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch from 33da883 to e14f8d0 Compare May 10, 2025 20:31
@debarshiray
Copy link
Member Author

Grr... looks like the system tests get stuck at:

...
# test suite: Tear down

@debarshiray
Copy link
Member Author

Grr... looks like the system tests get stuck at:

...
# test suite: Tear down

That's because of the p11-kit server process not getting cleaned up.

A subsequent commit will use this to give Toolbx containers access to
the certificates from certificate authorities on the host.

This changes the user-visible error message from:
  $ toolbox --verbose list
  ...
  DEBU Migrating to newer Podman: failed to create migration lock file
      /run/user/1000/toolbox/migrate.lock: open
      /run/user/1000/toolbox/migrate.lock: no such file or directory
  Error: failed to create migration lock file

... to:
  $ toolbox --verbose list
  ...
  DEBU Migrating to newer Podman: failed to create lock file
      /run/user/1000/toolbox/migrate.lock: open
      /run/user/1000/toolbox/migrate.lock: no such file or directory
  Error: failed to create lock file

Or, from:
  $ toolbox --verbose list
  ...
  DEBU Migrating to newer Podman: failed to acquire migration lock on
      /run/user/1000/toolbox/migrate.lock: bad file descriptor
  Error: failed to acquire migration lock

... to:
  $ toolbox --verbose list
  ...
  DEBU Migrating to newer Podman: failed to acquire lock on
      /run/user/1000/toolbox/migrate.lock: bad file descriptor
  Error: failed to acquire lock

This is admittedly less specific without the debug logs, but it's
probably alright because it's such an unlikely error.

containers#626
A subsequent commit will use this to give Toolbx containers access to
the certificates from certificate authorities on the host.

The ideal goal is to ensure that all supported Toolbx containers and
images have p11-kit-client.so in them.  In practice, some of them never
will.  Either because it's an existing container or an older version of
an image that was already present in the local containers/storage image
store, or because the operating system is too old.

Therefore, there needs to be a way to check at runtime if a Toolbx
container has p11-kit-client.so or not.

containers#626
@debarshiray
Copy link
Member Author

debarshiray commented Jun 1, 2025

Grr... looks like the system tests get stuck at:

...
# test suite: Tear down

That's because of the p11-kit server process not getting cleaned up.

To prevent the test suite from getting stuck, we need to clean up the p11-kit server process.

To do that we need to be careful we don't touch the p11-kit server instance that was invoked by normal uses of Toolbx. One way to do this is to isolate the host's XDG_RUNTIME_DIR from the test suite. Unfortunately, that has turned out to be easier said than done.

To avoid letting perfection become the enemy of the good, I will split the tests out of this pull request, so that they can be added later when the above problems are solved.

@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch from e14f8d0 to 456f377 Compare June 1, 2025 13:37
Copy link

Copy link

@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch 2 times, most recently from 12f7767 to c6654ce Compare June 1, 2025 21:36
@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch 2 times, most recently from 6d6c98f to 2707bf3 Compare June 1, 2025 22:37
@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch 2 times, most recently from a637605 to 261248d Compare June 1, 2025 22:52
Copy link

This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system.  Any program that uses PKCS containers#11 to access CA certificates should
see the same ones both inside the container and on the host.

During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.

Just like Flatpak, the singleton 'p11-kit server' process is not
terminated when the last 'enter' or 'run' command exits.

The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS containers#11 module instead of the usual p11-kit-trust.so
module.  This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.

However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.

This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS containers#11 modules or
p11-kit-client.so are missing inside the container.  None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.

The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so.  Until that happens, containers created from them
won't have access to the CA certificates from the host.

The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2].  To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user.  One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3].  So, this workaround
has to suffice until the problem is solved.

On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
  $ sudo dpkg --purge p11-kit
  dpkg: dependency problems prevent removal of p11-kit:
   adoptium-ca-certificates depends on p11-kit.

Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed.  The rest of the
p11-kit package is left untouched.

[1] Flatpak commit 66b2ff40f7caf3a7
    flatpak/flatpak@66b2ff40f7caf3a7
    flatpak/flatpak#1757
    p11-glue/p11-kit#68

[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html

[3] containers#1652

containers#626
@debarshiray debarshiray force-pushed the wip/rishi/issue-626 branch from 261248d to 5ed2442 Compare June 2, 2025 13:59
Copy link

Copy link
Member Author

@debarshiray debarshiray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was pointed out that the code here broke child sessions started by sshd(8). There's a fix for it beginning to appear at #1695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants