Unverified Commit a6303da3 authored by Ryan Lahfa's avatar Ryan Lahfa Committed by GitHub
Browse files

Merge pull request #273062 from JulienMalka/systemd-boot-counting

nixos/systemd-boot: init boot counting
parents 46ec6ef0 eb435897
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -16,6 +16,8 @@ In addition to numerous new and upgraded packages, this release has the followin
   - This can be disabled through the `environment.stub-ld.enable` option.
   - If you use `programs.nix-ld.enable`, no changes are needed. The stub will be disabled automatically.

- NixOS now has support for *automatic boot assessment* (see [here](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/)) for detailed description of the feature) for `systemd-boot` users. Available as [boot.loader.systemd-boot.bootCounting](#opt-boot.loader.systemd-boot.bootCounting.enable).

- Julia environments can now be built with arbitrary packages from the ecosystem using the `.withPackages` function. For example: `julia.withPackages ["Plots"]`.

## New Services {#sec-release-24.05-new-services}
+38 −0
Original line number Diff line number Diff line
# Automatic boot assessment with systemd-boot {#sec-automatic-boot-assessment}

## Overview {#sec-automatic-boot-assessment-overview}

Automatic boot assessment (or boot-counting) is a feature of `systemd-boot` that allows for automatically detecting invalid boot entries.
When the feature is active, each boot entry has an associated counter with a user defined number of trials. Whenever `system-boot` boots an entry, its counter is decreased by one, ultimately being marked as *bad* if the counter ever reaches zero. However, if an entry is successfully booted, systemd will permanently mark it as *good* and remove the counter altogether. Whenever an entry is marked as *bad*, it is sorted last in the systemd-boot menu.
A complete explanation of how that feature works can be found [here](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/).

## Enabling the feature {#sec-automatic-boot-assessment-enable}

The feature can be enabled by toogling the [boot.loader.systemd-boot.bootCounting](#opt-boot.loader.systemd-boot.bootCounting.enable) option.

## The boot-complete.target unit {#sec-automatic-boot-assessment-boot-complete-target}

A *successful boot* for an entry is defined in terms of the `boot-complete.target` synchronisation point. It is up to the user to schedule all necessary units for the machine to be considered successfully booted before that synchronisation point.
For example, if you are running `nsd`, an authoritative DNS server on a machine and you want to be sure that a *good* entry is an entry where that DNS server is started successfully. A configuration for that NixOS machine could look like that:

```
boot.loader.systemd-boot.bootCounting.enable = true;
services.nsd.enable = true;
/* rest of nsd configuration omitted */

systemd.services.nsd = {
  before = [ "boot-complete.target" ];
  wantedBy = [ "boot-complete.target" ];
  unitConfig.FailureAction = "reboot";
};

```

## Interaction with specialisations {#sec-automatic-boot-assessment-specialisations}

When the boot-counting feature is enabled, `systemd-boot` will still try the boot entries in the same order as they are displayed in the boot menu. This means that the specialisations of a given generation will be tried directly after that generation. A generation being marked as *bad* do not mean that its specialisations will also be marked as *bad* (as its specialisations could very well be booting successfully).


## Limitations {#sec-automatic-boot-assessment-limitations}

This feature has to be used wisely to not risk any data integrity issues. Rollbacking into past generations can sometimes be dangerous, for example if some of the services may have undefined behaviors in the presence of unrecognized data migrations from future versions of themselves.
+156 −60
Original line number Diff line number Diff line
@@ -12,8 +12,9 @@ import subprocess
import sys
import warnings
import json
from typing import NamedTuple, Dict, List
from typing import NamedTuple, Dict, List, Type, Generator, Iterable
from dataclasses import dataclass
from pathlib import Path


@dataclass
@@ -28,8 +29,115 @@ class BootSpec:
    specialisations: Dict[str, "BootSpec"]
    initrdSecrets: str | None = None

@dataclass
class Entry:
    profile: str | None
    generation_number: int
    specialisation: str | None

    @classmethod
    def from_path(cls: Type["Entry"], path: Path) -> "Entry":
        filename = path.name
        # Matching nixos-$profile-generation-*.conf
        rex_profile = re.compile(r"^nixos-(.*)-generation-.*\.conf$")
        # Matching nixos*-generation-$number*.conf
        rex_generation = re.compile(r"^nixos.*-generation-([0-9]+).*\.conf$")
        # Matching nixos*-generation-$number-specialisation-$specialisation_name*.conf
        rex_specialisation = re.compile(r"^nixos.*-generation-([0-9]+)-specialisation-([a-zA-Z0-9]+).*\.conf$")
        profile = rex_profile.sub(r"\1", filename) if rex_profile.match(filename) else None
        specialisation = rex_specialisation.sub(r"\2", filename) if rex_specialisation.match(filename) else None
        try:
            generation_number = int(rex_generation.sub(r"\1", filename))
        except ValueError:
            raise
        return cls(profile, generation_number, specialisation)


BOOT_ENTRY = """title {title}
version Generation {generation} {description}
linux {kernel}
initrd {initrd}
options {kernel_params}
machine-id {machine_id}
sort-key {sort_key}
"""

@dataclass
class DiskEntry():
    entry: Entry
    default: bool
    counters: str | None
    title: str
    description: str
    kernel: str
    initrd: str
    kernel_params: str
    machine_id: str

    @classmethod
    def from_path(cls: Type["DiskEntry"], path: Path) -> "DiskEntry":
        entry = Entry.from_path(path)
        with open(path, 'r') as f:
            data = f.read().splitlines()
            if '' in data:
                data.remove('')
            entry_map = dict(l.split(' ', 1) for l in data)
            assert "title" in entry_map
            assert "version" in entry_map
            version_splitted = entry_map["version"].split(" ", 2)
            assert version_splitted[0] == "Generation"
            assert version_splitted[1].isdigit()
            assert "linux" in entry_map
            assert "initrd" in entry_map
            assert "options" in entry_map
            assert "machine-id" in entry_map
            assert "sort-key" in entry_map
            filename = path.name
            # Matching nixos*-generation-*$counters.conf
            rex_counters = re.compile(r"^nixos.*-generation-.*(\+\d(-\d)?)\.conf$")
            counters = rex_counters.sub(r"\1", filename) if rex_counters.match(filename) else None
            disk_entry = cls(
                    entry=entry,
                    default=(entry_map["sort-key"] == "default"),
                    counters=counters,
                    title=entry_map["title"],
                    description=entry_map["version"],
                    kernel=entry_map["linux"],
                    initrd=entry_map["initrd"],
                    kernel_params=entry_map["options"],
                    machine_id=entry_map["machine-id"])
            return disk_entry

    def write(self) -> None:
        tmp_path = self.path.with_suffix(".tmp")
        with tmp_path.open('w') as f:
            # We use "sort-key" to sort the default generation first.
            # The "default" string is sorted before "non-default" (alphabetically)
            f.write(BOOT_ENTRY.format(title=self.title,
                          generation=self.entry.generation_number,
                          kernel=self.kernel,
                          initrd=self.initrd,
                          kernel_params=self.kernel_params,
                          machine_id=self.machine_id,
                          description=self.description,
                          sort_key="default" if self.default else "non-default"))
            f.flush()
            os.fsync(f.fileno())
        tmp_path.rename(self.path)


    @property
    def path(self) -> Path:
        pieces = [
            "nixos",
            self.entry.profile or None,
            "generation",
            str(self.entry.generation_number),
            f"specialisation-{self.entry.specialisation}" if self.entry.specialisation else None,
        ]
        prefix = "-".join(p for p in pieces if p)
        return Path(f"@efiSysMountPoint@/loader/entries/{prefix}{self.counters if self.counters else ''}.conf")

libc = ctypes.CDLL("libc.so.6")

class SystemIdentifier(NamedTuple):
@@ -56,29 +164,14 @@ def system_dir(profile: str | None, generation: int, specialisation: str | None)
    else:
        return d

BOOT_ENTRY = """title {title}
version Generation {generation} {description}
linux {kernel}
initrd {initrd}
options {kernel_params}
"""

def generation_conf_filename(profile: str | None, generation: int, specialisation: str | None) -> str:
    pieces = [
        "nixos",
        profile or None,
        "generation",
        str(generation),
        f"specialisation-{specialisation}" if specialisation else None,
    ]
    return "-".join(p for p in pieces if p) + ".conf"


def write_loader_conf(profile: str | None, generation: int, specialisation: str | None) -> None:
def write_loader_conf(profile: str | None) -> None:
    with open("@efiSysMountPoint@/loader/loader.conf.tmp", 'w') as f:
        if "@timeout@" != "":
            f.write("timeout @timeout@\n")
        f.write("default %s\n" % generation_conf_filename(profile, generation, specialisation))
        if profile:
            f.write("default nixos-%s-generation-*\n" % profile)
        else:
            f.write("default nixos-generation-*\n")
        if not @editor@:
            f.write("editor 0\n")
        f.write("console-mode @consoleMode@\n")
@@ -86,6 +179,17 @@ def write_loader_conf(profile: str | None, generation: int, specialisation: str
        os.fsync(f.fileno())
    os.rename("@efiSysMountPoint@/loader/loader.conf.tmp", "@efiSysMountPoint@/loader/loader.conf")

def scan_entries() -> Generator[DiskEntry, None, None]:
    """
    Scan all entries in $ESP/loader/entries/*
    Does not support Type 2 entries as we do not support them for now.
    Returns a generator of Entry.
    """
    for path in Path("@efiSysMountPoint@/loader/entries/").glob("nixos*-generation-[1-9]*.conf"):
        try:
            yield DiskEntry.from_path(path)
        except ValueError:
            continue

def get_bootspec(profile: str | None, generation: int) -> BootSpec:
    system_directory = system_dir(profile, generation, None)
@@ -120,7 +224,7 @@ def copy_from_file(file: str, dry_run: bool = False) -> str:
    return efi_file_path

def write_entry(profile: str | None, generation: int, specialisation: str | None,
                machine_id: str, bootspec: BootSpec, current: bool) -> None:
                machine_id: str, bootspec: BootSpec, entries: Iterable[DiskEntry], current: bool) -> None:
    if specialisation:
        bootspec = bootspec.specialisations[specialisation]
    kernel = copy_from_file(bootspec.kernel)
@@ -142,28 +246,30 @@ def write_entry(profile: str | None, generation: int, specialisation: str | None
                  f'for "{title} - Configuration {generation}", an older generation', file=sys.stderr)
            print("note: this is normal after having removed "
                  "or renamed a file in `boot.initrd.secrets`", file=sys.stderr)
    entry_file = "@efiSysMountPoint@/loader/entries/%s" % (
        generation_conf_filename(profile, generation, specialisation))
    tmp_path = "%s.tmp" % (entry_file)
    kernel_params = "init=%s " % bootspec.init

    kernel_params = kernel_params + " ".join(bootspec.kernelParams)
    build_time = int(os.path.getctime(system_dir(profile, generation, specialisation)))
    build_date = datetime.datetime.fromtimestamp(build_time).strftime('%F')

    with open(tmp_path, 'w') as f:
        f.write(BOOT_ENTRY.format(title=title,
                    generation=generation,
    counters = "+@bootCountingTrials@" if @bootCounting@ else ""
    entry = Entry(profile, generation, specialisation)
    # We check if the entry we are writing is already on disk
    # and we update its "default entry" status
    for entry_on_disk in entries:
        if entry == entry_on_disk.entry:
            entry_on_disk.default = current
            entry_on_disk.write()
            return

    DiskEntry(
            entry=entry,
            title=title,
            kernel=kernel,
            initrd=initrd,
            counters=counters,
            kernel_params=kernel_params,
                    description=f"{bootspec.label}, built on {build_date}"))
        if machine_id is not None:
            f.write("machine-id %s\n" % machine_id)
        f.flush()
        os.fsync(f.fileno())
    os.rename(tmp_path, entry_file)

            machine_id=machine_id,
            description=f"{bootspec.label}, built on {build_date}",
            default=current).write()

def get_generations(profile: str | None = None) -> list[SystemIdentifier]:
    gen_list = subprocess.check_output([
@@ -188,30 +294,19 @@ def get_generations(profile: str | None = None) -> list[SystemIdentifier]:
    return configurations[-configurationLimit:]


def remove_old_entries(gens: list[SystemIdentifier]) -> None:
    rex_profile = re.compile(r"^@efiSysMountPoint@/loader/entries/nixos-(.*)-generation-.*\.conf$")
    rex_generation = re.compile(r"^@efiSysMountPoint@/loader/entries/nixos.*-generation-([0-9]+)(-specialisation-.*)?\.conf$")
def remove_old_entries(gens: list[SystemIdentifier], disk_entries: Iterable[DiskEntry]) -> None:
    known_paths = []
    for gen in gens:
        bootspec = get_bootspec(gen.profile, gen.generation)
        known_paths.append(copy_from_file(bootspec.kernel, True))
        known_paths.append(copy_from_file(bootspec.initrd, True))
    for path in glob.iglob("@efiSysMountPoint@/loader/entries/nixos*-generation-[1-9]*.conf"):
        if rex_profile.match(path):
            prof = rex_profile.sub(r"\1", path)
        else:
            prof = None
        try:
            gen_number = int(rex_generation.sub(r"\1", path))
        except ValueError:
            continue
        if not (prof, gen_number, None) in gens:
            os.unlink(path)
    for disk_entry in disk_entries:
        if (disk_entry.entry.profile, disk_entry.entry.generation_number, None) not in gens:
            os.unlink(disk_entry.path)
    for path in glob.iglob("@efiSysMountPoint@/efi/nixos/*"):
        if not path in known_paths and not os.path.isdir(path):
        if path not in known_paths and not os.path.isdir(path):
            os.unlink(path)


def get_profiles() -> list[str]:
    if os.path.isdir("/nix/var/nix/profiles/system-profiles/"):
        return [x
@@ -284,16 +379,17 @@ def install_bootloader(args: argparse.Namespace) -> None:
    gens = get_generations()
    for profile in get_profiles():
        gens += get_generations(profile)
    remove_old_entries(gens)
    entries = scan_entries()
    remove_old_entries(gens, entries)
    for gen in gens:
        try:
            bootspec = get_bootspec(gen.profile, gen.generation)
            is_default = os.path.dirname(bootspec.init) == args.default_config
            write_entry(*gen, machine_id, bootspec, current=is_default)
            write_entry(*gen, machine_id, bootspec, entries, current=is_default)
            for specialisation in bootspec.specialisations.keys():
                write_entry(gen.profile, gen.generation, specialisation, machine_id, bootspec, current=is_default)
                write_entry(gen.profile, gen.generation, specialisation, machine_id, bootspec, entries, current=is_default)
            if is_default:
                write_loader_conf(*gen)
                write_loader_conf(gen.profile)
        except OSError as e:
            # See https://github.com/NixOS/nixpkgs/issues/114552
            if e.errno == errno.EINVAL:
+15 −1
Original line number Diff line number Diff line
@@ -49,6 +49,8 @@ let
        ${pkgs.coreutils}/bin/install -D $empty_file "${efi.efiSysMountPoint}/efi/nixos/.extra-files/loader/entries/"${escapeShellArg n}
      '') cfg.extraEntries)}
    '';
    bootCountingTrials = cfg.bootCounting.trials;
    bootCounting = if cfg.bootCounting.enable then "True" else "False";
  };

  checkedSystemdBootBuilder = pkgs.runCommand "systemd-boot" {
@@ -69,7 +71,10 @@ let
  '';
in {

  meta.maintainers = with lib.maintainers; [ julienmalka ];
  meta = {
    maintainers = with lib.maintainers; [ julienmalka ];
    doc = ./boot-counting.md;
  };

  imports =
    [ (mkRenamedOptionModule [ "boot" "loader" "gummiboot" "enable" ] [ "boot" "loader" "systemd-boot" "enable" ])
@@ -238,6 +243,15 @@ in {
      '';
    };

    bootCounting = {
      enable = mkEnableOption (lib.mdDoc "automatic boot assessment");
      trials = mkOption {
        default = 3;
        type = types.int;
        description = lib.mdDoc "number of trials each entry should start with";
      };
    };

  };

  config = mkIf cfg.enable {
+4 −0
Original line number Diff line number Diff line
@@ -101,6 +101,10 @@ let
      "systemd-rfkill.service"
      "systemd-rfkill.socket"

      # Boot counting
      "boot-complete.target"
    ] ++ lib.optional config.boot.loader.systemd-boot.bootCounting.enable "systemd-bless-boot.service" ++ [

      # Hibernate / suspend.
      "hibernate.target"
      "suspend.target"
Loading