Commit 3573423f authored by Raul Puri's avatar Raul Puri
Browse files

added presplit-sentences to scripts

parent d0878333
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -10,6 +10,7 @@ python pretrain_bert.py \
    --tokenizer-model-type bert-large-uncased \
    --vocab-size 30522 \
    --train-data wikipedia \
    --presplit-sentences \
    --loose-json \
    --text-key text \
    --split 1000,1,1 \
+1 −0
Original line number Diff line number Diff line
@@ -17,6 +17,7 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
    --tokenizer-model-type bert-large-uncased \
    --vocab-size 30522 \
    --train-data wikipedia \
    --presplit-sentences \
    --loose-json \
    --text-key text \
    --split 1000,1,1 \
+1 −0
Original line number Diff line number Diff line
@@ -10,6 +10,7 @@ python pretrain_bert.py \
    --tokenizer-path tokenizer.model \
    --vocab-size 30522 \
    --train-data wikipedia \
    --presplit-sentences \
    --loose-json \
    --text-key text \
    --split 1000,1,1 \