Unverified Commit 1987c483 authored by Maximilian Bosch's avatar Maximilian Bosch
Browse files

nixos/test-driver: use vhost-device-vsock for SSH backdoor

`vhost-device-vsock`[1] is a custom implementation of AF_VSOCK, but the
application on the host-side uses a UNIX domain-socket. This gives us
the following nice properties:

* We don't need to do `--arg sandbox-paths /dev/vhost-vsock` anymore for
  debugging builds within the sandbox. That means, untrusted users can
  also debug these kinds of tests now.

* This prevents CID conflicts on the host-side, i.e. there's no need for
  using `sshBackdoor.vsockOffset` for tests anymore.

A big shout-out goes to Allison Karlitskaya, the developer of test.thing[2]
who talked about this approach to do AF_VSOCK on All Systems Go 2025.

This patch requires systemd 258[3] because this contains `vhost-mux` in
its SSH config which is needed to connect to the VMs from now on.

To not blow up the patches even more, this only uses AF_VSOCK for the
debugger. A potential follow-up for the future would be a removal of the
current `backdoor.service` and replace it entirely by this
functionality.

The internal implementation tries to be consistent with how VLANs and
machines are handled, i.e. the processes are started when the Driver's
context is entered and cleaned up in __exit__().

I decided to push the process management and creation of sockets for
vhost-device-vsock into its own class, that's an implementation detail
and not a concern for the test-driver. In fact, `vhost-device-vsock` is
something we can drop once QEMU implements native support for using
AF_UNIX on the host-side[4]. `VsockPair` is its own class since
returning e.g. a triple of `(Path, Path, Int)` would be ambiguous in
what is the guest and what the host path (and frankly, I found it hard
to distinguish the two when reading the docs of `vhost-device-vsock`
initially).

Finally, now that we can do the SSH backdoor without adding additional
devices to the sandbox, I figured, it's time to write a test-case for
it.

[1] https://github.com/rust-vmm/vhost-device/blob/main/vhost-device-vsock/README.md
[2] https://codeberg.org/lis/test.thing
[3] https://github.com/NixOS/nixpkgs/pull/427968
[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
parent 9dcf7113
Loading
Loading
Loading
Loading
+15 −33
Original line number Diff line number Diff line
@@ -88,52 +88,34 @@ An SSH-based backdoor to log into machines can be enabled with
}
```

::: {.warning}
Make sure to only enable the backdoor for interactive tests
(i.e. by using `interactive.sshBackdoor.enable`)! This is the only
supported configuration.

Running a test in a sandbox with this will fail because `/dev/vhost-vsock` isn't available
in the sandbox.
:::

This creates a [vsock socket](https://man7.org/linux/man-pages/man7/vsock.7.html)
for each VM to log in with SSH. This configures root login with an empty password.

When the VMs get started interactively with the test-driver, it's possible to
connect to `machine` with
On the host-side a UNIX domain-socket is used with
[vhost-device-vsock](https://github.com/rust-vmm/vhost-device/blob/main/vhost-device-vsock/README.md).
That way, it's not necessary to assign system-wide unique vsock numbers.

```
$ ssh vsock/3 -o User=root
$ ssh vsock-mux//tmp/path/to/host -o User=root
```

The socket numbers correspond to the node number of the test VM, but start
at three instead of one because that's the lowest possible
vsock number. The exact SSH commands are also printed out when starting
`nixos-test-driver`.

On non-NixOS systems you'll probably need to enable
the SSH config from {manpage}`systemd-ssh-proxy(1)` yourself.

If starting VM fails with an error like
The socket paths are printed when starting the test driver:

```
qemu-system-x86_64: -device vhost-vsock-pci,guest-cid=3: vhost-vsock: unable to set guest cid: Address already in use
Note: this requires systemd-ssh-proxy(1) to be enabled (default on NixOS 25.05 and newer).
    machine:  ssh -o User=root vsock-mux//tmp/tmpg1rp9nti/machine_host.socket
```

it means that the vsock numbers for the VMs are already in use. This can happen
if another interactive test with SSH backdoor enabled is running on the machine.
On non-NixOS systems you'll probably need to enable
the SSH config from {manpage}`systemd-ssh-proxy(1)` yourself.

In that case, you need to assign another range of vsock numbers. You can pick another
offset with
During a test-run, it's possible to print the SSH commands again by running

```nix
{
  sshBackdoor = {
    enable = true;
    vsockOffset = 23542;
  };
}
```
In [2]: dump_machine_ssh()
SSH backdoor enabled, the machines can be accessed like this:
Note: this requires systemd-ssh-proxy(1) to be enabled (default on NixOS 25.05 and newer).
    machine:  ssh -o User=root vsock-mux//tmp/tmpg1rp9nti/machine_host.socket
```

## Port forwarding to NixOS test VMs {#sec-nixos-test-port-forwarding}
+3 −11
Original line number Diff line number Diff line
@@ -512,19 +512,11 @@ Once you are in the sandbox shell, you can access the VMs (for example, `machine
with SSH over vsock:

```
bash# ssh -F ./ssh_config vsock/3
bash# ssh -F ./ssh_config -o User=root vsock-mux//tmp/.../machine_host.socket
```

For the AF_VSOCK feature to work, `/dev/vhost-vsock` is needed in the sandbox
which can be done with e.g.

```
nix-build -A nixosTests.foo --option sandbox-paths /dev/vhost-vsock
```

As described in [](#sec-nixos-test-ssh-access), the numbers for vsock start at
`3` instead of `1`. So the first VM in the network (sorted alphabetically) can
be accessed with `vsock/3`.
The socket paths are printed at the beginning of the test. See
[](#sec-nixos-test-ssh-access) for more context.

### SSH access to test containers {#sec-test-container-ssh-access}

+0 −3
Original line number Diff line number Diff line
@@ -2174,9 +2174,6 @@
  "test-opt-sshBackdoor.enable": [
    "index.html#test-opt-sshBackdoor.enable"
  ],
  "test-opt-sshBackdoor.vsockOffset": [
    "index.html#test-opt-sshBackdoor.vsockOffset"
  ],
  "test-opt-enableDebugHook": [
    "index.html#test-opt-enableDebugHook"
  ],
+2 −0
Original line number Diff line number Diff line
@@ -14,6 +14,7 @@
  remote-pdb,

  netpbm,
  vhost-device-vsock,
  nixosTests,
  qemu_pkg ? qemu_test,
  qemu_test,
@@ -56,6 +57,7 @@ buildPythonApplication {
    socat
    util-linux
    vde2
    vhost-device-vsock
  ]
  ++ lib.optionals enableNspawn [
    systemd
+5 −4
Original line number Diff line number Diff line
@@ -147,9 +147,9 @@ def main() -> None:
        type=Path,
    )
    arg_parser.add_argument(
        "--dump-vsocks",
        "--enable-ssh-backdoor",
        help="indicates that the interactive SSH backdoor is active and dumps information about it on start",
        type=int,
        action="store_true",
    )

    args = arg_parser.parse_args()
@@ -197,9 +197,10 @@ def main() -> None:
        keep_machine_state=args.keep_machine_state,
        global_timeout=args.global_timeout,
        debug=debugger,
        enable_ssh_backdoor=args.enable_ssh_backdoor,
    ) as driver:
        if offset := args.dump_vsocks:
            driver.dump_machine_ssh(offset)
        if args.enable_ssh_backdoor:
            driver.dump_machine_ssh()
        if args.interactive:
            history_dir = os.getcwd()
            history_path = os.path.join(history_dir, ".nixos-test-history")
Loading