Commit 651cc107 authored Feb 03, 2021 by Mostofa Patwary

Merging megatron with ICT

parents e919dd8e c601d751

.gitlab-ci.yml

0 → 100644

+10 −0

Original line number	Diff line number	Diff line
		image: gitlab-master.nvidia.com/dl/dgx/pytorch:20.12-py3-devel

		test:
		script:
		- pytest --junitxml=report.xml tests
		artifacts:
		when: always
		reports:
		junit: report.xml

		No newline at end of file

LICENSE

+2 −1

Original line number	Diff line number	Diff line
		@@ -29,7 +29,7 @@ The following applies to all files unless otherwise noted:
		--

		This repository also contains code from Hugging Face Inc., Google Research,
		and Facebook (from their Fairseq project). Files from these
		Facebook (from their Fairseq project), and Philip Popien. Files from these
		organizations have notices at the top of each file. Below are licenses
		used in those files, as indicated.

		@@ -262,3 +262,4 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
		LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
		OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
		SOFTWARE.

README.md

+124 −124

File changed.

Preview size limit exceeded, changes collapsed.

examples/evaluate_zeroshot_gpt2.sh→examples/evaluate_zeroshot_gpt.sh

+0 −0

File moved.

examples/finetune_mnli_distributed.sh

+2 −2

Original line number	Diff line number	Diff line
		@@ -28,11 +28,11 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS ./tasks/main.py \
		--num-layers 24 \
		--hidden-size 1024 \
		--num-attention-heads 16 \
		--batch-size 8 \
		--micro-batch-size 8 \
		--checkpoint-activations \
		--lr 5.0e-5 \
		--lr-decay-style linear \
		--warmup 0.065 \
		--lr-warmup-fraction 0.065 \
		--seq-length 512 \
		--max-position-embeddings 512 \
		--save-interval 500000 \