Unverified Commit c808181a authored by Maximilian Bosch's avatar Maximilian Bosch Committed by GitHub
Browse files

Merge: test-driver: Implement debugging breakpoint hooks (#422066)

parents 53011939 b4b72182
Loading
Loading
Loading
Loading
+51 −0
Original line number Diff line number Diff line
@@ -340,3 +340,54 @@ id-prefix: test-opt-
list-id: test-options-list
source: @NIXOS_TEST_OPTIONS_JSON@
```

## Accessing VMs in the sandbox with SSH {#sec-test-sandbox-breakpoint}

As explained in [](#sec-nixos-test-ssh-access), it's possible to configure an
SSH backdoor based on AF_VSOCK. This can be used to SSH into a VM of a running
build in a sandbox.

This can be done when something in the test fails, e.g.

```nix
{
  nodes.machine = {};

  sshBackdoor.enable = true;
  enableDebugHook = true;

  testScript = ''
    start_all()
    machine.succeed("false") # this will fail
  '';
}
```

For the AF_VSOCK feature to work, `/dev/vhost-vsock` is needed in the sandbox
which can be done with e.g.

```
nix-build -A nixosTests.foo --option sandbox-paths /dev/vhost-vsock
```

This will halt the test execution on a test-failure and print instructions
on how to enter the sandbox shell of the VM test. Inside, one can log into
e.g. `machine` with

```
ssh -F ./ssh_config vsock/3
```

As described in [](#sec-nixos-test-ssh-access), the numbers for vsock start at
`3` instead of `1`. So the first VM in the network (sorted alphabetically) can
be accessed with `vsock/3`.

Alternatively, it's possible to explicitly set a breakpoint with
`debug.breakpoint()`. This also has the benefit, that one can step through
`testScript` with `pdb` like this:

```
$ sudo /nix/store/eeeee-attach <id>
bash# telnet 127.0.0.1 4444
pdb$ …
```
+6 −0
Original line number Diff line number Diff line
@@ -1902,6 +1902,9 @@
  "test-opt-sshBackdoor.vsockOffset": [
    "index.html#test-opt-sshBackdoor.vsockOffset"
  ],
  "test-opt-enableDebugHook": [
    "index.html#test-opt-enableDebugHook"
  ],
  "test-opt-defaults": [
    "index.html#test-opt-defaults"
  ],
@@ -2010,6 +2013,9 @@
  "sec-nixos-test-testing-hardware-features": [
    "index.html#sec-nixos-test-testing-hardware-features"
  ],
  "sec-test-sandbox-breakpoint": [
    "index.html#sec-test-sandbox-breakpoint"
  ],
  "chap-developing-the-test-driver": [
    "index.html#chap-developing-the-test-driver"
  ],
+2 −0
Original line number Diff line number Diff line
@@ -14,6 +14,7 @@
  extraPythonPackages ? (_: [ ]),
  nixosTests,
}:

python3Packages.buildPythonApplication {
  pname = "nixos-test-driver";
  version = "1.1";
@@ -32,6 +33,7 @@ python3Packages.buildPythonApplication {
      junit-xml
      ptpython
      ipython
      remote-pdb
    ]
    ++ extraPythonPackages python3Packages;

+10 −0
Original line number Diff line number Diff line
@@ -5,6 +5,7 @@ from pathlib import Path

import ptpython.ipython

from test_driver.debug import Debug, DebugAbstract, DebugNop
from test_driver.driver import Driver
from test_driver.logger import (
    CompositeLogger,
@@ -65,6 +66,10 @@ def main() -> None:
        help="drop into a python repl and run the tests interactively",
        action=argparse.BooleanOptionalAction,
    )
    arg_parser.add_argument(
        "--debug-hook-attach",
        help="Enable interactive debugging breakpoints for sandboxed runs",
    )
    arg_parser.add_argument(
        "--start-scripts",
        metavar="START-SCRIPT",
@@ -129,6 +134,10 @@ def main() -> None:
    if not args.keep_vm_state:
        logger.info("Machine state will be reset. To keep it, pass --keep-vm-state")

    debugger: DebugAbstract = DebugNop()
    if args.debug_hook_attach is not None:
        debugger = Debug(logger, args.debug_hook_attach)

    with Driver(
        args.start_scripts,
        args.vlans,
@@ -137,6 +146,7 @@ def main() -> None:
        logger,
        args.keep_vm_state,
        args.global_timeout,
        debug=debugger,
    ) as driver:
        if args.interactive:
            history_dir = os.getcwd()
+53 −0
Original line number Diff line number Diff line
import logging
import os
import random
import shutil
import subprocess
import sys
from abc import ABC, abstractmethod

from remote_pdb import RemotePdb  # type:ignore

from test_driver.logger import AbstractLogger


class DebugAbstract(ABC):
    @abstractmethod
    def breakpoint(self, host: str = "127.0.0.1", port: int = 4444) -> None:
        pass


class DebugNop(DebugAbstract):
    def __init__(self) -> None:
        pass

    def breakpoint(self, host: str = "127.0.0.1", port: int = 4444) -> None:
        pass


class Debug(DebugAbstract):
    def __init__(self, logger: AbstractLogger, attach_command: str) -> None:
        self.breakpoint_on_failure = False
        self.logger = logger
        self.attach = attach_command

    def breakpoint(self, host: str = "127.0.0.1", port: int = 4444) -> None:
        """
        Call this function to stop execution and put the process on sleep while
        at the same time have the test driver provide a debug shell on TCP port
        `port`. This is meant to be used for sandboxed tests that have the test
        driver feature `enableDebugHook` enabled.
        """
        pattern = str(random.randrange(999999, 9999999))
        self.logger.log_test_error(
            f"Breakpoint reached, run 'sudo {self.attach} {pattern}'"
        )
        os.environ["bashInteractive"] = shutil.which("bash")  # type:ignore
        if os.fork() == 0:
            subprocess.run(["sleep", pattern])
        else:
            # RemotePdb writes log messages to both stderr AND the logger,
            # which is the same here. Hence, disabling the remote_pdb logger
            # to avoid duplicate messages in the build log.
            logging.root.manager.loggerDict["remote_pdb"].disabled = True  # type:ignore
            RemotePdb(host=host, port=port).set_trace(sys._getframe().f_back)
Loading