Podman Systemd Run Docs
On a Linux computer the command systemd-run
can be used to run Podman as another user.
⚠️ Work-in-progress, experimental
status: Work-in-progress. Experimental draft. Fact-checking is needed regarding the diagrams and the explanation of the steps in the diagrams. The diagrams and the explanation of the steps were written with a bit of guessing of how it works.
Run podman as another user (using --property User=
)
Using sudo systemd-run --user --property User=username ...
Run the commands
sudo useradd test
uid=$(id -u test)
sudo systemd-run \
--property User=test \
--property Requires=user@${uid}.service \
--property After=user@${uid}.service \
--property Environment=XDG_RUNTIME_DIR=/run/user/${uid} \
--collect \
--pipe \
--quiet \
--wait \
podman run --quiet --rm alpine echo hello
The text hello
is written to stdout.
The white boxes are processes running as root and the colored boxes are processes running as the user test. The steps explained in more detail (here assuming 1000 is UID for the user test):
- systemd-run requests a new transient service unit from the systemd system manager
using dbus. In the request systemd-run also passes the file descriptors
for stdin, stdout, and stderr to the systemd system manager. To learn more about
the technology used for passing file descriptors over a Unix socket see
SCM_RIGHTS
andsendmsg()
inman 7 unix
. - systemd system manager makes sure that the systemd user manager instance
user@1000.service is in the active state. If needed, systemd system manager
will start the user@1000.service, which means
systemd --user
is executed. - systemd system manager starts podman with a fork/exec.
- podman starts conmon with a fork/exec.
- conmon starts OCI runtime with a fork/exec.
- OCI runtime starts container with an exec.
Run podman as another user (using --machine=
)
Using sudo systemd-run --user --machine=username@ ...
Run the commands
sudo useradd test
sudo systemd-run \
--collect \
--machine=test@ \
--pipe \
--quiet \
--user \
--wait \
podman run \
--quiet \
--rm \
alpine \
echo hello
The text hello
is written to stdout.
The white boxes are processes running as root and the colored boxes are processes running as the user test. The steps explained in more detail (here assuming 1000 is UID for the user test):
- systemd-run starts systemd-run (a second instance) with fork/exec
- systemd-run (second instance) sends a dbus request to systemd requesting that systemd-stdio-bridge
should be started.
In the request systemd-run also passes the file descriptors
for stdin, stdout, and stderr to the systemd system manager. To learn more about
the technology used for passing file descriptors over a Unix socket see
SCM_RIGHTS
andsendmsg()
inman 7 unix
. - systemd system manager starts systemd-stdio-bridge running as user UID 1000 and let the stdin, stdout, stderr be the file descriptors it was passed from step 2.
- The systemd system manager makes sure that the systemd user manager instance
__user@1000.service__ is in the active state. If needed, systemd system manager
will start __user@1000.service__, which means
systemd --user
is executed. - systemd-stdio-bridge announces its precesence to systemd user manager using dbus.
- systemd-run (first instance) requests a new transient service unit from the systemd user manager using dbus.
- systemd user manager starts podman with a fork/exec.
- podman starts conmon with a fork/exec.
- conmon starts OCI runtime with a fork/exec.
- OCI runtime starts the container with an exec.
Using --property OpenFile=
The systemd directive OpenFile=
was introduced in systemd 253 (released 15 February 2023) and is available in for example Fedora 38.
The OpenFile=
directive instructs systemd to open a file before starting the service. The file descriptor
will be passed to the started service as an inherited file descriptor.
Example: systemd system manager opens the file
/root/secretfile in read-only mode and the container reads the file descriptor
Run the commands
sudo bash -c 'echo "hello from secret" > /etc/secretfile'
sudo chmod 700 /etc/secretfile
sudo useradd test
uid=$(id -u test)
sudo systemd-run \
--collect \
--pipe \
--property OpenFile=/etc/secretfile:myfdname:read-only \
--property User=test \
--property Requires=user@${uid}.service \
--property After=user@${uid}.service \
--property Environment=XDG_RUNTIME_DIR=/run/user/${uid} \
--quiet \
--wait \
podman run --quiet --rm alpine sh -c "cat <&3"
The text hello from secret
is written to stdout.
Example: systemd user manager opens the file
/home/test/secretfile in read-only mode and the container reads the file descriptor
Run the commands
sudo useradd test
sudo bash -c 'echo "hello from secret" > /home/test/secretfile'
sudo chmod 700 /home/test/secretfile
sudo chown test:test /home/test/secretfile
uid=$(id -u test)
sudo systemd-run \
--user \
--machine=test@ \
--property OpenFile=/home/test/secretfile:myfdname:read-only \
--collect \
--pipe \
--quiet \
--wait \
podman run -q --rm --user 65534:65534 alpine sh -c "cat <&3"
The text hello from secret
is written to stdout. (When I tested
on Fedora CoreOS 38.20230430.1.0 (with container-selinux-2.209.0-1.fc38.noarch)
I first had to run sudo setenforce 0
to get this example to work)
audit2allow showed that these rules are necessary
#============= container_t ==============
allow container_t user_home_t:file read;
#============= systemd_logind_t ==============
#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow systemd_logind_t etc_t:file map;
Note, in the example the option --user 65534:65534
was added to highlight
the fact that the container user does not need to be mapped
to the regular user of the host (i.e. the user test).
Example: systemd system manager opens the file
/root/secretfile in read-only mode and a container running in a container reads the file descriptor
Run the commands
sudo bash -c 'echo "hello from secret" > /etc/secretfile'
sudo chmod 700 /etc/secretfile
sudo useradd test
uid=$(id -u test)
sudo systemd-run \
--collect \
--pipe \
--property OpenFile=/etc/secretfile:myfdname:read-only \
--property User=test \
--property Requires=user@${uid}.service \
--property After=user@${uid}.service \
--property Environment=XDG_RUNTIME_DIR=/run/user/${uid} \
--quiet \
--wait \
podman run \
--device /dev/fuse \
--quiet \
--rm \
--security-opt label=disable \
--user podman \
quay.io/podman/stable \
podman run \
--quiet \
--rm \
alpine sh -c "cat <&3"
The text hello from secret
is written to stdout.
The command took about 1 minute to run. To see more progress remove the --quiet
flags.