Fix dataset shuffle
Randomly shuffle the training dataset
this fixes an issue where a checkpoint resume in the middle of the epoch does not point to the correct offset in the dataset
nevertheless, the dataset split should occur deterministically to avoid mixing train and val datasets between restarts