Commit 651cc107 authored by Mostofa Patwary's avatar Mostofa Patwary
Browse files

Merging megatron with ICT

parents e919dd8e c601d751
Loading
Loading
Loading
Loading

.gitlab-ci.yml

0 → 100644
+10 −0
Original line number Diff line number Diff line
image: gitlab-master.nvidia.com/dl/dgx/pytorch:20.12-py3-devel

test:
  script:
    - pytest --junitxml=report.xml tests
  artifacts:
    when: always
    reports:
      junit: report.xml
    
 No newline at end of file
+2 −1
Original line number Diff line number Diff line
@@ -29,7 +29,7 @@ The following applies to all files unless otherwise noted:
--

This repository also contains code from Hugging Face Inc., Google Research,
and Facebook (from their Fairseq project). Files from these
Facebook (from their Fairseq project), and Philip Popien. Files from these
organizations have notices at the top of each file. Below are licenses
used in those files, as indicated.

@@ -262,3 +262,4 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+124 −124

File changed.

Preview size limit exceeded, changes collapsed.

+2 −2
Original line number Diff line number Diff line
@@ -28,11 +28,11 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS ./tasks/main.py \
               --num-layers 24 \
               --hidden-size 1024 \
               --num-attention-heads 16 \
               --batch-size 8 \
               --micro-batch-size 8 \
               --checkpoint-activations \
               --lr 5.0e-5 \
               --lr-decay-style linear \
               --warmup 0.065 \
               --lr-warmup-fraction 0.065 \
               --seq-length 512 \
               --max-position-embeddings 512 \
               --save-interval 500000 \
Loading