Unbreaking reloads: strategies for fast and non-blocking reconfiguration
2017-10-22, 08:45–09:15 (UTC), Galerie

When configuration changes, daemon-reload stops the world in an increasingly unsustainable way. The problem is getting worse for two reasons: (1) heavier use of systemd means more units and longer reload times and (2) expanded use of socket activation/D-Bus activation/automount means more things urgently need PID 1's attention. There are ways to fix this up, but we'll need to move away from stopping the world (the main event loop), throwing out most loaded state, reloading state, and then resuming event handling.


When configuration changes, daemon-reload stops the world in an increasingly unsustainable way. The problem is getting worse for two reasons: (1) heavier use of systemd means more units and longer reload times and (2) expanded use of socket activation/D-Bus activation/automount means more things urgently need PID 1's attention. There are ways to fix this up, but we'll need to move away from stopping the world (the main event loop), throwing out most loaded state, reloading state, and then resuming event handling.

We'll explore these options:

  • Incremental state reloading, possibly when dependencies and other cascading configuration remains the same
  • Amortized state reloading with an atomic switch on completion
  • Offloading configuration loading to a separate thread or process, followed by an atomic switch-over on completion.

We'll need to be careful to maintain the memory footprint on resource-constrained devices, but we have options:

  • Choosing to still stop the world when a system is resource-constrained
  • Storing unit data in a tree that supports snapshots and copy-on-write, which would constrain the maximum footprint during reload to barely more than it is today