A welcome session for All Systems Go!
UKIs are a fundamental building block of modern measured and trusted boot chains. Let's have a look at what happened in the area and discuss recently added new concepts, such as "add-ons", new PE sections, build tools and more.
At Meta, we've been working to add encryption support to btrfs, with exciting implications for per-container security. Traditionally encryption has either dealt with whole disks, with LUKS, or with a few filesystems: ext4, f2fs, ubifs, and ceph, lacking in advanced volume management. Btrfs has several features these filesystems don't: deduplicating/reflinking identical data, subvolume/snapshot management, and integrated checksumming. These features allow giving containers their own encrypted subvolume with a key only loaded when the container is running, preventing container storage from being read while turned off, and making deletion of expired containers' storage secure.
In this presentation, we introduce Inspektor Gadget, a tool designed for the creation, deployment, and execution of eBPF programs (gadgets) across Kubernetes and Linux environments. Inspektor Gadget encapsulates eBPF programs into OCI containers, providing well-understood and easily distributable units.
We'll delve into Inspektor Gadget's automatic data enrichment process, transforming complex kernel information into high-level, understandable concepts tied to Kubernetes, container runtimes, systemd, etc. This feature bridges the knowledge gap between raw, low-level data and more interpretable information, improving the understanding of system behavior.
We will illustrate how to use a simple configuration file to set up a data collection pipeline with Inspektor Gadget, resulting in a Prometheus endpoint or an exposed API.
Throughout the talk, we'll demonstrate Inspektor Gadget's features, support across various environments, discuss its operational mechanics, and share insights into the future direction of the project.
By presenting at ASG!, our aim is not just to inform the audience of Inspektor Gadget, but also to encourage feedback and stimulate discussions within the eBPF and Linux community.
With the introduction of "Forensic Container Checkpointing" in Kubernetes 1.25 it is possible to checkpoint containers. The ability to checkpoint containers opens up many new use cases. Containers can be migrated without loosing the state of the container, fast startup from existing checkpoints, using spot instances more effective. The primary use case, based on the title of the Kubernetes enhancement proposal, is the forensic analysis of the checkpointed containers.
In this session I want to introduce the different possible use cases of "Forensic Container Checkpointing" with a focus on how to perform forensic analysis on the checkpointed containers. The presented use cases and especially the forensic analysis will be done as a live demo giving the audience a hands on experience.
Using an image-based OS brings advantages and challenges. One challenge is the customization of a read-only image with additional host-level software and configuration, and how to manage this customization through the lifetime of a machine.
For deeper changes in /usr, users might build their own images instead of following the official image updates. For common scenarios, the vendor may choose to offer multiple image flavors. Simpler user customization can live outside of the read-only /usr, scattered as config files and binaries in /etc and /opt. Configuration management tools struggle with reliable (re)configuration because tracking filesystem state is hard.
The systemd project now supports a mechanism for extension images. There are two types; system extensions create an overlay for /usr or /opt and configuration extensions create an overlay for /etc. Through the overlay, users can thus change the read-only /usr without building custom OS images. Vendors can also offer their supported flavors as extensions instead of different OS images, even as composable stack where the user can choose optional parts. Users can manage their configuration by replacing the extension images atomically. Since the images bundle all files, this prevents old files lingering around or a system in a half-finished state. The read-only extension images help with setting up attestation and integrity enforcement for their contents. For distributions providing prebuilt initrds (e.g., the Fedora mkosi-initrd proposal), extensions allow initrd customization provided by the distribution or user.
The presentation will give an overview, share use cases and examples, and discuss future improvements for extension images.
Stopping the old and starting a new service afresh -- that is what service restart is roughly about. We will look what it comprises in more detail from service manager perspective and also from the service's client end. Thus we will look at how FDSTORE API can be used to smooth service restart. Furthermore, we will review how unit instances may provide further distinction between the stopped and the restarted service. Finally, we go through options that the existing service have to adopt these methods.
strace is a traditional userspace tracer utility for Linux, implemented using ptrace API. Despite of the abundance of various kernel tracing interfaces nowadays, there are certain classes of tasks that are still better served by strace. In this talk the maintainer of strace will provide examples of such tasks.
systemd v254 introduced a new reboot type: soft-reboot. It shortcuts the reboot process by not restarting the kernel, and instead shutting down userspace, followed by re-exec'ing systemd from the new rootfs, starting everything up again. Not only this allows to save time by virtue of doing less work, but it also allow select resources (File Descriptor Store) and select services that do not use the rootfs (Portable Services) to survive the reboot and continue uninterrupted. This talk will explore the details of this new feature, how it works, why it's useful, what are the shortcomings and how to make full use of it.
Let's discuss about
bpfilter, a userspace daemon that empowers services to create efficient packet-filtering BPF programs using a high-level representation of filtering rules.
Let's get you up to speed on Trusted Platform Modules (TPM 2.0) and Linux. Specifically, the various additions to basic Linux userspace, i.e. systemd in our goal to make measured boot a default on Linux.
This talk will discuss new features provided by the new kernel mount API interface
The TPM event log contains a history of all measurements made with the TPM.
Complete with some context information for each measurement it is intended to
help with recreating the current PCR contents. What was meant as a debugging
tool turns out to be of vital importance when trying to remotely attest real
life systems. This is mostly because of the overuse of certain PCR and the
general mess that is x86
Sadly, there are many event logs. UEFI keeps one for its measurements and those
done by EFI applications like GRUB and shim. If a system is booted in an MLE
using tboot the ACM firmware code also maintains an event log that can be
accessed via a pointer in an ACPI table. Now, systemd also has an event log
that is mixed into the general journal log. Finally Linux IMA maintains it's
own event log -- an append-only, in-kernel data structure.
On top of that every bootloader or userspace application that wants to measure
something into the TPM will also need to maintain an event log.
How about we fix that? The talk will sketch out a solution that maintains a
unified, global event log of the whole system on disk and exposes an interface for
other applications that wish to measure things into the TPM. We'll also fix a
race conditions in IMA as well as correctly handle S3 resume w.r.t measured boot
while we're at it.
Despite being ordinary computers with an ASIC for switching, in reality network hardware must still be treated differently from normal servers. In recent years a lot has improved, and vendors offer white box switches, allowing users to install a (network) operating system of their choice. Of course, the NOS needs to support the firmware interface for the particular ASIC, and this is not standardized: swtitchdev, DSA, SAI – none of them supporting all devices. Due to SONiC dominance, a lot of vendors seem to support SAI (Switch Abstraction Interface). But SAI requires a proprietary external Linux kernel module. On the NOS side, Open Network Linux was abandoned, and Azure’s SONiC is the new popular kid on the block, running a Docker daemon. There are other differences in the network hardware ecosystem: For example ONIE as the bootloader environment. Also working with upstream and using established software developing practices are lacking, resulting in a maintenance burden. Projects like DENT or OpenWrt go one step further by only supporting upstream Linux kernel interfaces, but now dentOS is also going to support SAI.
This talk gives a short introduction into the network operating systems, and then focuses on DENT with the ONL fork dentOS, and shares experiences. Curiously, problems how to treat firmware blobs and discussions about what distribution to use as a base, are not unknown to these projects either.
In light of the climate crises, and despite hardware getting faster and faster, fully powering down systems and back on on demand – the obvious choice – is still inconvenient, as boot times are still very long. Even ChromeOS still has not lowered its limit from ten seconds since years. Show the current status of the hobby project on x86 hardware, and give an overview of recent Linux kernel developments getting rid some of the delays.
A short case study on where we are with sandboxing APT; what gaps there are and what technologies we looked at.
A walkthrough of an interesting use case for the
FICLONE ioctl: cloning file data into a tar archive, and cloning files out of it again. "Free" archiving and unarchiving at zero-copy speeds!
- Copy-on-write and the
- The ancient
- A trick for adding arbitrary padding to the
tarformat in order to force file system page alignment
- How to avoid symlink attacks and other TOCTOU issues, using the fairly recently introduced (linux 5.6)
- An interesting bug in GNU tar
At the end you'll receive a free autographed copy of deduptar to use for party tricks. 🥳
The utmp implementation of glibc uses on quite some 64bit architectures a 32bit time variable, which leads to an overflow on 03:14:07 UTC on 19 January 2038. This talk will explain the current work on replacing utmp with logind.
Some quick numbers and maybe curiousities from our work on evaluating which libraries need to be rebuilt for 64-bit time_t on armhf in Ubuntu using abi-compliance-checker.
Image based OS updates are the future. One way to handle updates is via
content-addressable synchronisation software, like casync and desync.
This talk with give a presentation about the two - their overall design,
feature set and strengths and weaknesses. It will also demonstrate a real
world use-case of them.
A quick overview of the work in progress to plumb PID FDs through Linux userspace, to achieve resilience and security improvements
The journey of developing a Linux platform to require very little in the way of configuration management, and how to virtually eliminate the need to modify code to change configuration. From configuration via scripts and evolving through a couple of configuration management products, we have used the idea of matching actions to timescales to transform how we do configuration management. We now do very little of it, and we have dramatically reduced its complexity.
All Systems Go! lightning talk
All Systems Go! lightning talk
All Systems Go! lightning talk
The social event will take place, once again, at Haus am See from 19:00-23:00. Food will be served and drinks tokens will be handed out at the door.
19:00-21:00 - Food and drinks on the ground floor with access to the club are on the lower level
21:00-23:00 - We move to the club area on the lower floor which we have exclusively for more drinks and mingling. The ground floor will be open for non-All Systems Go! folks.
You can stay after after 23:00 but after that point there is no official All Systems Go! function and you're own your own. ;)
Confidential compute is a new compute and programming paradigm to run an application in enclave, a run-time encrypted and authenticated trusted execution environment. We give an overview of the current technologies provided by AMD, Intel and ARM. We also give an overview of open source tools to leverage compute along a tutorial to enclave any applications with few command lines.
The Linux Userspace team at Meta aims to make significant contributions to upstream userspace projects, while also ensuring that Meta is able to leverage those improvements. In this talk we'll give an overview of the team and brief history of how it was formalized. Then we'll dive deeper into some of the efforts we've worked on with the open source community and features we've adopted internally. Come if you enjoy hearing about systemd, BPF, distributions, and more!
The Talos Linux distribution is built from scratch with the goal of providing a secure, verified, and minimal-footprint operating system for running Kubernetes clusters. Talos is designed to be immutable, minimal, and secure. Talos includes only the bare minimum required to run Kubernetes.
This talk will cover how Talos uses Unified Kernel Images (UKIs) to provide immutable, verified, and secure booting. We will also cover how Talos partially conforms to the Linux Userspace API Group specification (UAPI) to implement some of the best practices with regards to fully verifiable TrustedBoot extending to the userspace.
The talk wants to provide a brief introduction into Confidential Containers Project. We'll discuss the rationale behind Confidential Computing and how concepts like Trusted Computing or Remote Attestation can be leveraged by end-users to guard their workloads not only from malicious actors but also their cloud service provider. Confidential Containers, an open-source CNCF project, aims to extend the experience of deploying cloud-native software on Kubernetes with the option to move sensitive workloads into confidential enclaves with minimal friction to the user experience. We'll introduce the components and container technologies we are using to achieve that, hint at some conceptual problems we are facing and provide a simple example of how confidential containers work in practice today.
This talk will be a whirlwind overview of NixOS modules and the lessons I've learned with maintaining and writing new ones.
openSUSE is a general purpose, rpm based distribution. One of it's unique features is the use of btrfs snapshots to offer rollback of the root file system of both traditional as well as transactional systems. This talk explains the challenges faced to integrate systemd-boot into openSUSE.
The build system should get out of the way to let us focus on our tasks, not be distracted by slow or unreliable builds, get fast feedback on changes, and let us know what’s in the software we’re shipping to our users. But, what does it take for a build system to be really fast and reliable? What does it take to know what’s in the software?
It requires aggressive parallelism and distributed caching to avoid redundant work between colleagues. And it requires complete knowledge and control of dependencies, build isolation to identify mistakes, and reproducible builds to verify results across machines and strengthen supply-chain security.
In this talk you will learn how Google’s open source build system Bazel and the purely functional package manager Nix join forces to provide fast, correct, and reproducible builds.
Modification of the kernel command line has historically been one of the easiest ways to customize system behavior. Bootloaders allow for persistent changes via config-files and on-the-fly changes interactively during system boot.
System behavior changes made via the kernel command line are not limited to the kernel itself. Userspace applications from installers to init systems and beyond also take input from /proc/cmdline.
It is clear that some kernel command line options are desirable (console=ttyS0 verbose) and possibly even necessary. Others, such as the cromulent 'init=/bin/sh', can allow circumvention of benefits that Secureboot and TPM provide.
How to control access to kernel command line modification is a non-trivial subject. A recent pull request to systemd that added "command-line addons" garnered hundreds of comments.
This talk will cover:
* The stub loader 'stubby' and its allowed-list approach to kernel command line options.
* Systemd-stub’s solution for command line customization
* System changes that can be made through kernel command line.
* Alternative channels such as smbios oem strings, or qemu 'fw_cfg'
This talk will explore the ideas from Lennart's "Fitting Everything Together"
blog post, particularly the A/B partitioning scheme and its bootloader design,
comparing it with the approach used on the SteamDeck. Spoiler alert, we're not
We will focus on the requirements that drove us to the latter design, some
implementation details, and hurdles we needed to overcome to achieve that
Lastly, the idea of finding common ground will be entertained where audience
participation is greatly encouraged. What features would be acceptable by the
wider systemd community? Would those be enough for the SteamDeck to jump ship?
Arch Linux has worked with its own packaging framework - Arch Linux Package Management (ALPM) - for about 20 years.
This talk is about an effort to rewrite low-level components and to create specifications for related metadata files using the Rust programming language.
It will cover new projects in the ALPM (https://gitlab.archlinux.org/archlinux/alpm/) group as well as several other related ones and give an outlook on future developments using the 🦀
A/B partitioning is great - you hermetically drop-in the whole new OS and boot
into it. Although, how can we manage and migrate the RW configuration and state
files that lie within? Can we do that reliably on both OS upgrades and
This talk will explore the design used on the SteamDeck, the issues
we've seen while drawing analogies, and future inspiration with "Fitting
Everything Together" by Lennart Poettering in mind.
Network operating systems commonly provide a stable userspace platform for networking devices. Integration of userspace applications as well as low-level hardware support are handled by firmware build systems.
Existing build systems for network operating systems display numerous limitations by either targeting only distinct types of devices, using cumbersome methodologies to add additional features or offering insufficient capabilities regarding what to include in the firmware image. In this presentation, we provide an overview of these limitations and how we mitigate them with Replica.one, an Open Source firmware builder which targets the entire networking stack.
We will focus on the solution's optimization features, its capability to generate firmware for diverse classes of devices across the entire networking stack, and the flexibility to select the desired operating system between various Linux-based distributions.
systemd-repart has recently learned many features to make it useful for building discoverable disk images. In this talk, we'll give a deep-dive on the new features and how they can be used to assemble discoverable disk images.
Recently, atomic updates via image based systems have become more relevant for
servers and desktops, as they allow predictable management of large fleets. In the
embedded Linux space, this approach has been the default for many years and
proven updaters exist already.
In this talk, we will delve into RAUC and look at how its design and features
have been driven by the requirements for robust, atomic updates.
The presentation will introduce the fundamental concepts surrounding A/B fallback
and update signing in the context of embedded Linux updates.
We will then explore the commonalities and differences between RAUC and systemd's
The discussion will progress to cover RAUC's bundle-based update system, which
allows for comprehensive system updates without the need for local storage,
thanks to HTTP streaming. Additionally, we will demonstrate how adaptive updates
minimize download sizes without necessitating version-specific patch management.
openSUSE Aeon (formerly MicroOS Desktop) aims to be a fully fledged modern Linux Desktop leveraging as many of the latest user space innovations available including:
- Immutable OS with Transactional Updates
- Secure Boot
- TPM Encryption
- Flatpaks & OCI containers as primary application delivery
This talk will introduce the distribution, highlight the adoption of some of the latest foundational user space technologies as well as share some of the pain points being faced and invite the audience to contribute to this exciting platform.
Are you using container images with hundreds of known vulnerabilities?
The majority of us are using images based on the Docker official images available on the Docker Hub. This includes base images – such as Debian and Ubuntu – as well as application images such as nginx and redis. Unfortunately these images often have hundreds of known vulnerabilities due to excessively large dependency trees with out-of-date packages. This security debt can lead to unnecessary security risks and slower development cycles.
Wolfi (https://github.com/wolfi-dev/) is a new Linux distribution optimized for building minimal, secure container images. Wolfi maintainers prioritize a rolling release model built on a rapid package update cycle, which ensures that new vulnerabilities are remediated quickly.
This talk not only describes the problems that motivate Wolfi but also provides hands-on knowledge to help developers take advantage of Wolfi. By the end of the talk, developers will learn about packaging techniques with apko and melange, tools specifically designed to build Wolfi packages and turn them into minimal, low- or no-vulnerability containers.
mkosi is a tool for building operating system images. In this talk we'll give an introduction to mkosi, how we use it to develop systemd and discuss how we want to support running and updating systems with mkosi and other systemd tooling.
BuildStream is a tool for building / integrating software stacks. In a way, it has a similar goal to bitbake / yocto and Android repo, but takes a completely different approach. It can be used to take software from various sources, build it with various buildsystems in a reproducible sandbox, and cache results for speedy rebuilds.
In this talk I give a brief overview of Buildstream, how it is used to build GNOME OS, and the challenges we face in using it. I also go over freedesktop-sdk which is a base runtime that can be used as a base to build your own system.
I also discuss the challenges we encountered with using buildstream with ostree and the steps we're taking to support updating with systemd-sysupdate.
A quick journey through the Azure infrastructure, specifically looking at how image-based Linux is used for Azure Boost, what it enables, what interesting security and performance features were added and where to find them upstream.
In this talk we’ll discuss antlir2, Meta’s solution to building container and bare metal operating system images. We’ll talk about how we have built performant, hermetic and deterministic image building infrastructure on top of buck2 (Meta’s new open source build system) and how we enable users to compose their own multi-language projects with full operating systems, write tests and deploy their images. Along the way, we’ll also cover how antlir2 wrangles dnf and other upstream tooling to behave more predictably for better, more reliable images.
sdbusplus generates ergonomic and compile-time type-checked dbus bindings built atop sd-bus. This library is heavily used within the OpenBMC project to provide all IPC between its many userspace processes. This talk will give an overview of how OpenBMC leverages dbus, how sdbusplus facilitates its usage, as well as an introduction on our approach for asynchronous programming with C++ co-routines.
Closing session of All Systems Go! 2023