Skip to content
Snippets Groups Projects
Commit 7fe7768b authored by Yin, Junqi's avatar Yin, Junqi
Browse files

update descriptions

parent 6c4bf7a0
No related branches found
No related tags found
No related merge requests found
......@@ -13,11 +13,12 @@
### [2. PyTorch Distributed Example](#Section2)
* [4 Communication Methods Setup](#4-setup)
* [Performance Comparisons](#comm-compare)
* [BERT on Summit](#bert-summit)
### [3. TensorFlow Distributed Example](#Section3)
* [Multi-worker Mirrored Strategy](#tf-dist)
* [Add Horovod Support](#add-hvd)
* [Running on Summit](#run-summit)
* [ResNet on Summit](#run-summit)
* [Performance: Training Speed vs Convergence](#perf)
### [4. Scaling considerations](#Section4)
......@@ -165,7 +166,23 @@ else:
The job script (`examples/pytorch/job.lsf`) and testing logs (`examples/pytorch/logs`) for 4 distriubtion modes are also avaiable. Based on the performance plot, we recommend to use Horovod with NCCL backend as the communication method.
![](examples/pytorch/pytorch_comm_batch32.png "Comparisons of communication methods")
![](examples/pytorch/synthetic/pytorch_comm_batch32.png "Comparisons of communication methods")
### <a name="bert-summit"></a>2.3 BERT on Summit
This example is modified from Nvidia's [BERT example](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT), which demonstrates the usage of Apex data parallel on Summit for NLP workloads. The key modifications are environment variables setup for each rank to establish the communicator,
```bash
nodes=($(cat ${LSB_DJOB_HOSTFILE} | sort | uniq | grep -v login | grep -v batch))
head=${nodes[0]}
export RANK=$OMPI_COMM_WORLD_RANK
export LOCAL_RANK=$OMPI_COMM_WORLD_LOCAL_RANK
export WORLD_SIZE=$OMPI_COMM_WORLD_SIZE
export MASTER_ADDR=$head
export MASTER_PORT=29500 # default from torch launcher
```
The performance of pre-training BERT on Wikipedia corpus is shown in the following plot (more [details](./examples/pytorch/BERT/README.md)),
![](examples/pytorch/BERT/bert-summit.png "BERT performance on Summit")
## <a name="Section3"></a>3. TensorFlow Distributed Example
......@@ -203,7 +220,7 @@ offical/resnet/imagenet_main.py
offical/resnet/resnet_run_loop.py
offical/utils/misc/distribution_utils.py
```
### <a name="run-summit"></a>3.3 Running on Summit
### <a name="run-summit"></a>3.3 ResNet on Summit
For the built-in `MultiWorkerMirroredStrategy`, the currently tuning knob is the choice of communication layer, gRPC or NCCL, and NCCL should be used on Summit for better performance.
For Horovod, there are several paramters that can be tuned, and following are what we found works well for ResNet on Summit
......
# BERT benchmark on Wikipedia corpus
This example is a modified version of Nvidia's [BERT example](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT)
## Requirement
Download and pre-process Wikipedia corpus following steps in original [quick start guide](./README_nv.md#quick-start-guide). Then set the `INPUT_DATA` in [submit_pretraining.lsf](./submit_pretraining.lsf) to the data path.
## How to run
Simply submit the [job script](./submit_pretraining.lsf) from the example directory.
The key modifications are for launching Apex data parallel on Summit
```bash
nodes=($(cat ${LSB_DJOB_HOSTFILE} | sort | uniq | grep -v login | grep -v batch))
head=${nodes[0]}
export RANK=$OMPI_COMM_WORLD_RANK
export LOCAL_RANK=$OMPI_COMM_WORLD_LOCAL_RANK
export WORLD_SIZE=$OMPI_COMM_WORLD_SIZE
export MASTER_ADDR=$head
export MASTER_PORT=29500 # default from torch launcher
echo "Setting env_var RANK=${RANK}"
echo "Setting env_var LOCAL_RANK=${LOCAL_RANK}"
echo "Setting env_var WORLD_SIZE=${WORLD_SIZE}"
echo "Setting env_var MASTER_ADDR=${MASTER_ADDR}"
echo "Setting env_var MASTER_PORT=${MASTER_PORT}"
```
Which is sourced by each rank running the [task script](./scripts/run_pretraining_summit_32node_phase1.sh)
examples/pytorch/BERT/bert-summit.png

90.5 KiB

# PyTorch synthetic benchmark with `NCCL` and `MPI` backends and `DDL` and `Horovod` plugins
This example is a modified version of Horovod's [PyTorch examples](https://github.com/horovod/horovod/blob/master/examples/pytorch_imagenet_resnet50.py).
This example is a modified version of Horovod's [PyTorch examples](https://github.com/horovod/horovod/blob/master/examples/pytorch_synthetic_benchmark.py).
## Requirement
Horovod and PyTorch need to be installed in your environment.
You need to have access to `/gpfs/alpine/world-shared` directory on Summit. (All valid Summit users should have access)
Horovod (both NCCL and DDL backend) and PyTorch need to be installed in your environment.
## How to run
1. Navigate yourself to this folder.
2. Type bsub `bench.lsf` to submit the job.
2. Type bsub `job.lsf` to submit the job.
The following is the modification for general usage. (taken from [Horovod repository](https://github.com/horovod/horovod/blob/master/docs/pytorch.rst))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment