Kata Containers for Docker in 2024

от автора

Kata Containers is actually now the main way to run containers in an isolated virtual machine for more security. My work requires me to run container images, which I can’t always fully trust. I used to use a virtual machine in Virtualbox in conjunction with Docker-Machine for this, but since Docker-Machine is no longer being developed as of 2021, I decided to look at replacement options.

And I was very surprised to find out that OpenSuse Tumbleweed, which I migrated to after 16 years with Ubuntu, doesn’t have any ready-made packages for Kata Containers. Moreover, that said, there are containers for Firecracker microvm, specifically created in AWS to run Linux containers in an isolated environment. But since the packages for Firecracker also came without the Firecracker-Containerd runtime, I went back to the idea of trying Kata Containers.

In 2024, the official Kata Containers documentation states that pre-built packages are only available in the repositories of Fedora and Centos distributions, and for Ubuntu via Snap packages. In other cases, such as mine, the only options are to build from source or install already built binaries from Github using special script kata-manager.sh. The developers themselves suggest the second option in the documentation.

The kata-manager.sh script itself will simply unpack the selected release archive from Github into /opt/kata, create configs in /etc/kata-containers/ and set up symlinks in /usr/bin/ for kata-runtime, kata-collect-data.sh and containerd-shim-kata-v2. This will already be enough for calling docker run --runtime io.containerd.run.kata.v2 to create a container in a Qemu virtual machine (default) with 1 cpu and 2 gb ram. Interesting fact: containerd searches for runtime in PATH using a special scheme, and io.containerd.run.kata.v2 turns into a search for containerd-shim-kata-v2 in PATH without having to configure anything in the configs.

What makes this installation worse than installing from packages from your Linux distribution: at the moment Kata Containers comes with its own complete set of hypervisors (Qemu, Firecracker, etc…), startup images and other necessary files, without depending on external components. And of course all these components need to be updated for security purposes. And in this case, you have to update at your own risk, completely replacing the files of the previous version with untested new ones. For example, in my case Fireckracker did not work in version 3.6.0 out of the box. In the case of distribution packages, there is at least a chance that they will be tested properly. Another consequence of the update will be the potential need to restart all already running containers with the new version, but this will be true for package installations as well.

Of course, a much better solution is to install each version in its own separate path. So, install the version we need from github:

VERSION=3.8.0 DIR="/opt/kata_$VERSION" PACKAGE="kata-static-$VERSION-$(uname -m | sed -e 's/x86_64/amd64/' -e 's/aarch64\|arm64/arm64/' -e 's/ppc64le/ppc64le/' -e 's/s390x/s390x/').tar.xz"  curl -LO https://github.com/kata-containers/kata-containers/releases/download/$VERSION/$PACKAGE  sudo mkdir -p "$DIR"  sudo tar -xJf "$PACKAGE" --strip-components=3 -C "$DIR"

The contents should unpack successfully:

$ ll $DIR total 20 drwxr-xr-x 1 root root   110 Aug 25 12:24 ./ drwxr-xr-x 1 root root   150 Aug 25 12:15 ../ drwxr-xr-x 1 root root   490 Aug 21 17:53 bin/ drwxr-xr-x 1 root root    50 Aug 15 20:52 include/ drwxr-xr-x 1 root root    70 Aug 15 20:52 lib/ drwxr-xr-x 1 root root    30 Aug  9 12:45 libexec/ drwxr-xr-x 1 root root     6 Aug 21 18:01 runtime-rs/ drwxr-xr-x 1 root root   154 Aug 21 18:02 share/ -rw-r--r-- 1 root root    29 Aug 21 18:08 VERSION -rw-r--r-- 1 root root 13678 Aug 21 18:08 versions.yaml

There are many available hypervisors to choose from, the default is Qemu:

$ ll $DIR/share/defaults/kata-containers/ total 420 drwxr-xr-x 1 root root   906 Aug 21 18:02 ./ drwxr-xr-x 1 root root    30 Aug 21 18:01 ../ -rw-r--r-- 1 root root 10930 Aug 21 18:02 configuration-acrn.toml -rw-r--r-- 1 root root 19799 Aug 21 18:02 configuration-clh.toml -rw-r--r-- 1 root root 16708 Aug 21 18:02 configuration-fc.toml -rw-r--r-- 1 root root 29756 Aug 21 18:02 configuration-qemu-coco-dev.toml -rw-r--r-- 1 root root 28990 Aug 21 18:02 configuration-qemu-nvidia-gpu-snp.toml -rw-r--r-- 1 root root 28964 Aug 21 18:02 configuration-qemu-nvidia-gpu-tdx.toml -rw-r--r-- 1 root root 29631 Aug 21 18:02 configuration-qemu-nvidia-gpu.toml -rw-r--r-- 1 root root 28098 Aug 21 18:02 configuration-qemu-se.toml -rw-r--r-- 1 root root 27635 Aug 21 18:02 configuration-qemu-sev.toml -rw-r--r-- 1 root root 29032 Aug 21 18:02 configuration-qemu-snp.toml -rw-r--r-- 1 root root 28798 Aug 21 18:02 configuration-qemu-tdx.toml -rw-r--r-- 1 root root 29649 Aug 21 18:02 configuration-qemu.toml -rw-r--r-- 1 root root 13910 Aug 21 18:02 configuration-remote.toml -rw-r--r-- 1 root root 17523 Aug 21 18:02 configuration-stratovirt.toml lrwxrwxrwx 1 root root    23 Aug 21 18:02 configuration.toml -> configuration-qemu.toml -rw-r--r-- 1 root root 10232 Aug 21 18:03 genpolicy-settings.json -rw-r--r-- 1 root root 36237 Aug 21 18:03 rules.rego drwxr-xr-x 1 root root   280 Aug 21 18:01 runtime-rs/

You can read more about the configuration here and here.

Now all we have to do is create the necessary symlinks:

sudo ln -s $DIR /opt/kata sudo ln -s /opt/kata/share/defaults/kata-containers /etc/kata-containers sudo ln -s /opt/kata/bin/containerd-shim-kata-v2 /usr/bin/containerd-shim-kata-v2 sudo ln -s /opt/kata/bin/kata-runtime /usr/bin/kata-runtime

Verify that everything worked by running the busybox container:

$ sudo docker run --runtime io.containerd.run.kata.v2 busybox uname -a Linux 88c1b982e983 6.1.62 #1 SMP Wed Jul 17 13:00:20 UTC 2024 x86_64 GNU/Linux

To make Kata the default runtime, you need to add “default-runtime”: “io.containerd.run.kata.v2” to /etc/docker/daemon.json:

$ cat /etc/docker/daemon.json {   "default-runtime": "io.containerd.run.kata.v2" }

And reboot docker:

sudo systemctl reload docker

Check it out:

$ sudo docker run busybox uname -a Linux 88c1b982e983 6.1.62 #1 SMP Wed Jul 17 13:00:20 UTC 2024 x86_64 GNU/Linux

In the future, when installing a new version, all we need to do is change the symlink:

sudo rm -f /opt/kata sudo ln -s $DIR /opt/kata

If we want to be able to use different versions of the Kata runtime without switching the /opt/kata symlink, we have to use a couple of hacks. The difficulty lies in the fact that the /opt/kata path, as well as two configuration file paths, are hardwired into the Kata executables:

$ kata-runtime --show-default-config-paths /etc/kata-containers/configuration.toml /opt/kata/share/defaults/kata-containers/configuration.toml

The configs can be overridden using the KATA_CONF_FILE variable. Also containerd-shim-kata-v2 will look for kata-runtime in the PATH. All this can be solved with a wrapper script.

Now we need to update all the files to use our new path instead of /opt/kata:

grep -rlI '/opt/kata' "$DIR" | sudo xargs sed -i "s|/opt/kata|$DIR|g"

Unfortunately, this won’t solve the problem with Qemu’s hardcoded path for finding bios, so we still need the /opt/kata symlink. If you haven’t created one yet, it’s time:

sudo ln -s $DIR /opt/kata

The good news is that thanks to the config changes, kata-runtime will use its own system images and binaries rather than the default ones in /etc/opt.

What remains is to create scripts to run containerd-shim-kata-v2:

cat <<EOF | sudo tee "/usr/bin/containerd-shim-kata_$(echo "$VERSION" | sed 's/\./_/g')-v2" > /dev/null #!/bin/sh export KATA_CONF_FILE="$DIR/share/defaults/kata-containers/configuration.toml" export PATH="$DIR/bin:\$PATH" exec $DIR/bin/containerd-shim-kata-v2 "\$@" EOF  sudo chmod +x "/usr/bin/containerd-shim-kata_$(echo "$VERSION" | sed 's/\./_/g')-v2"

And kata-runtime:

cat <<EOF | sudo tee "/usr/bin/kata-runtime-$VERSION" > /dev/null #!/bin/sh export KATA_CONF_FILE="$DIR/share/defaults/kata-containers/configuration.toml" export PATH="$DIR/bin:\$PATH" exec $DIR/bin/kata-runtime "\$@" EOF  sudo chmod +x "/usr/bin/kata-runtime-$VERSION"

Check it out:

$ echo "io.containerd.run.kata_$(echo "$VERSION" | sed 's/\./_/g').v2" io.containerd.run.kata_3_8_0.v2  $ sudo docker run --runtime io.containerd.run.kata_3_8_0.v2 busybox uname -a Linux 88c1b982e983 6.1.62 #1 SMP Wed Jul 17 13:00:20 UTC 2024 x86_64 GNU/Linux

By default, Kata runtime runs a virtual machine with 1 cpu and 2 gb ram. To override these parameters, we need to pass docker run the parameters --cpus (-c didn’t work for me) and -m or -memory, and the parameters given will be added to the default parameters, i.e. --cpus 1 --memory 512m will result in the use of 2 cpu and 2.5 gb memory. The host’s actual memory is not blocked by this 2.5 gb, but is used as needed.

Everything else, like networking, should work by default. For example, we can get a response from nginx:

$ sudo docker run -d --name nginx nginx 8663b33cad7e5b820be85468aa760e6ea34fc870adf7d924922788266041c898 $ sudo docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' nginx 172.17.0.2 $ curl 172.17.0.2 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html> 

Mounting also works fine:

$ sudo docker run --runtime io.containerd.run.kata_3_8_0.v2 -v /tmp:/tmp busybox ls /tmp dbus-1b94OvKJfW sddm-auth-ad8346f2-fbc8-4b74-9c03-e33d1136248e systemd-private-a2ad86d46c374394957b41737f945998-ModemManager.service-tsH78Z systemd-private-a2ad86d46c374394957b41737f945998-bluetooth.service-Bct8TG systemd-private-a2ad86d46c374394957b41737f945998-dbus-broker.service-mpkAbN systemd-private-a2ad86d46c374394957b41737f945998-fwupd.service-1Uiyrr systemd-private-a2ad86d46c374394957b41737f945998-iio-sensor-proxy.service-1IKcZf systemd-private-a2ad86d46c374394957b41737f945998-irqbalance.service-VvFZzU systemd-private-a2ad86d46c374394957b41737f945998-polkit.service-58al8n systemd-private-a2ad86d46c374394957b41737f945998-power-profiles-daemon.service-U13kul systemd-private-a2ad86d46c374394957b41737f945998-systemd-logind.service-r2T6Xu systemd-private-a2ad86d46c374394957b41737f945998-upower.service-aUQ6HX tmp.CERJypVFNb tmp.Lk54RYD5Tl tmp.OWS9PKcNm8 tmp.cUYADP9m4f tmp.hTZnG0skQ4 tmp.k3XnzrTg2k tmp.mFpNhgIVQT tmp.mvGoSlep7Z tmp.nIDfmAGIo2 tmp.yQNEJyMj86  $ sudo docker run --runtime io.containerd.run.kata_3_8_0.v2 -v /tmp:/tmp busybox mount none on / type virtiofs (rw,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) shm on /dev/shm type tmpfs (rw,relatime) none on /tmp type virtiofs (rw,relatime) kataShared on /etc/resolv.conf type virtiofs (rw,relatime) kataShared on /etc/hostname type virtiofs (rw,relatime) kataShared on /etc/hosts type virtiofs (rw,relatime) tmpfs on /proc/timer_list type tmpfs (rw,nosuid,size=65536k,mode=755) proc on /proc/bus type proc (ro,relatime) proc on /proc/fs type proc (ro,relatime) proc on /proc/irq type proc (ro,relatime) proc on /proc/sys type proc (ro,relatime)

ps: if you are looking for Senior or Lead DevOps in Europe — welcome:


ссылка на оригинал статьи https://habr.com/ru/articles/840538/


Комментарии

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *