Commit 16658f76 authored by Raito Bezarius's avatar Raito Bezarius
Browse files

nixos/netdata: introduce `deadlineBeforeStopSec`

Previously, we hardcoded a 60 second timer to stop netdata if we didn't have any answer back.
This is wrong and can cause data loss because the SIGTERM sent by systemd can sometimes be not honored.
Which in turn becomes a SIGKILL, causing potential data loss / corruption.

Offer a flag to users and bump the deadline to 2 minutes.
parent 7e72f076
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -387,6 +387,8 @@ In addition to numerous new and upgraded packages, this release has the followin
  }
  ```

- `services.netdata` offers a `deadlineBeforeStopSec` option which enable users who have netdata instance that takes time to initialize to not have systemd kill them for no reason.

- `services.dhcpcd` service now don't solicit or accept IPv6 Router Advertisements on interfaces that use static IPv6 addresses.
  If network uses both IPv6 Unique local addresses (ULA) and global IPv6 address auto-configuration with SLAAC, must add the parameter `networking.dhcpcd.IPv6rs = true;`.

+15 −1
Original line number Diff line number Diff line
@@ -169,6 +169,20 @@ in {
          See: <https://learn.netdata.cloud/docs/agent/anonymous-statistics>
        '';
      };

      deadlineBeforeStopSec = mkOption {
        type = types.int;
        default = 120;
        description = lib.mdDoc ''
          In order to detect when netdata is misbehaving, we run a concurrent task pinging netdata (wait-for-netdata-up)
          in the systemd unit.

          If after a while, this task does not succeed, we stop the unit and mark it as failed.

          You can control this deadline in seconds with this option, it's useful to bump it
          if you have (1) a lot of data (2) doing upgrades (3) have low IOPS/throughput.
        '';
      };
    };
  };

@@ -205,7 +219,7 @@ in {
          while [ "$(${pkgs.netdata}/bin/netdatacli ping)" != pong ]; do sleep 0.5; done
        '';

        TimeoutStopSec = 60;
        TimeoutStopSec = cfg.deadlineBeforeStopSec;
        Restart = "on-failure";
        # User and group
        User = cfg.user;