POSIX Transport creates file on Open() without Flush()
Created by: philip-davis
The open() system call can create the file on disk immediately. To demonstrate this, log into two different login nodes on, for example, Cori and run this code on one of them:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
int fd;
struct stat st;
open("abc.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
sleep(100);
return(0);
}
abc.txt will be visible on the other login node immediately, both for NFS and Lustre file systems.
The reason I bring this up is that this open() call is the same as the one used by FilePOSIX::Open() when openMode is Mode::Write. If I'm reading the code correctly, FilePOSIX is the default Transport library, including for BP4. This matters for the File Metadata Index. Even though we wait until after writing the header to do a flush, that flush is a no-op for POSIX, and the zero-length file has already been created.
This is, in turn, a problem because the reader could see the zero-length metadata index file, and then interpreting the header as a new timestep. I expect ParseMetadata would fail noisily in this case, but I haven't looked closely enough at the code to know that for sure.