Adding gradient-checkpointing to fit large network. Options are 'collection', 'memory', 'speed'. It increases total number of operations.
Adding gradient-checkpointing to fit large network. Options are 'collection', 'memory', 'speed'. It increases total number of operations.