Use MPI for torch distributed bootstrap
There appears to be some TCP connection issue on ascent preventing distributed initialization when certain hosts (h49n01) are involved. This seems to result from a timing issue in the torch C++ distributed backend (1.10).
This patch uses mpi4py to setup torch.distributed "manually", which relies on a collective and seems to solve the timing problem.
Since MPI is now being initialized (which loads ibverbs in spectrum MPI), we have to enable fork safe mode (IBV_FORK_SAFE
) to support dataloaders.