disMultiABM issues

disMultiABM issues https://code.ornl.gov/groups/disMultiABM/-/issues 2019-12-14T19:22:14Z https://code.ornl.gov/disMultiABM/stemdl/-/issues/3 Implement, train, and benchmark deeplabv3 2019-12-14T19:22:14Z Laanait, Nouamane

Implement, train, and benchmark deeplabv3

Sean got some pretty good task accuracy and hardware performance out of deeplab, surpassing FCDenseNet in both. Tasks: * [ ] Implement deeplab. * [ ] Train deeplab. * [ ] benchmark deeplab. This is related to #1 . Sean got some pretty good task accuracy and hardware performance out of deeplab, surpassing FCDenseNet in both. Tasks: * [ ] Implement deeplab. * [ ] Train deeplab. * [ ] benchmark deeplab. This is related to #1 . ACM GB Prize Prep To Do Laanait, Nouamane Laanait, Nouamane 2019-02-05 https://code.ornl.gov/disMultiABM/stemdl/-/issues/4 New Tensorflow I/O pipeline with LMDB 2019-02-07T14:40:56Z Laanait, Nouamane

New Tensorflow I/O pipeline with LMDB

Need new Tensorflow I/O pipeline. Current pipeline `stemdl/inputs.py/datasetTfrecord` uses tfrecords to read images/labels and stagingArea for asynchronous get/put. Large input sizes (i.e. images) expose intrinsic limitation of tfrec... Need new Tensorflow I/O pipeline. Current pipeline `stemdl/inputs.py/datasetTfrecord` uses tfrecords to read images/labels and stagingArea for asynchronous get/put. Large input sizes (i.e. images) expose intrinsic limitation of tfrecords. Best bandwidth achieved on Summit is __0.5 GB/sec__ for input sizes [1024,512,512] (CHW) float32. see `lrn001/nl/dl/tf_io` for all relevant scripts to benchmark I/O. Current I/O bandwidths lead to (very) poor single gpu performance. Per Sean, he experienced this with NERSC team, and moved away from tfrecords to (hdf5/numpy). LMDB should have much better reading performance than h5py/numpy (in pytorch @jqyin achieved I/O bandwiths of __2.5 GB/sec__). To do: 1. Sublcass `stemdl/inputs.py/DatasetTFRecords` and override pure tfrecords methods (`self.decode_image_label`) and modify `self.minibatch` to use __lmdb__ file (for I/O with lmdb/torch implementation see `stemdl/io_utils_torch.py/ABFDataSet`). 2. Benchmark with single python process. 3. Implement and Benchmark version with python `multiprocessing`. Done means: New TF I/O pipeline with bandwidths >= 1 GB/s. ACM GB Prize Prep Doing Yin, Junqi Starchenko, Vitalii Yin, Junqi 2019-02-08 https://code.ornl.gov/disMultiABM/stemdl/-/issues/1 FCDenseNet Benchmarks 2019-12-14T19:23:34Z Laanait, Nouamane

FCDenseNet Benchmarks

A Reconstruction of EM data using FCDenseNet looks promising. As such FCDenseNet is a top candidate model to use in a GB run and/or SC'19 paper submission. Carry out performance (single gpu for flops) and scaling (multiple nodes for comm... A Reconstruction of EM data using FCDenseNet looks promising. As such FCDenseNet is a top candidate model to use in a GB run and/or SC'19 paper submission. Carry out performance (single gpu for flops) and scaling (multiple nodes for communication) studies of FCDenseNet. Input sizes from simulation will vary in size, some relevant sizes are [_x_,256,256], _x_=16x16, 32x32, etc... Output size from simulation is [256,256]. Necessary code (to build FCDenseNet) is in stemdl/network Benchmarks (might) require following code mods: * [x] 1. Modify stemdl/inputs/DatasetTFRecords to generate batch of inputs+outputs on the fly. * [x] 2. Create dummy TFRecords (for relevant inputs+outputs) to assess impact of I/O. Benchmarks can be quantified using: * [ ] 1. gpu timeline traces. * [ ] 2. analytical flops. * [ ] 3. Model's data processing throughput as a function of ranks. __Important Notes__: 1. Most of the necessary code is already implemented (in stemdl). 2. Coordinate with Sean T. (in particular, Sean has a binary that forces direct convolutions --> 2x performance). ACM GB Prize Prep Doing Yin, Junqi Yin, Junqi 2019-01-28 https://code.ornl.gov/disMultiABM/stemdl/-/issues/5 recurrent ynet 2019-12-14T19:24:48Z Laanait, Nouamane

recurrent ynet

https://code.ornl.gov/disMultiABM/stemdl/-/issues/6 recurrent inverter machine 2019-12-15T00:14:33Z Laanait, Nouamane

recurrent inverter machine

reverse mode AD on forward model <--> forward mode AD on recurrent machine. Essentially, have a recurrent machine learn the dynamics of the forward model running on reverse mode AD and must be something to do forward mode as always. reverse mode AD on forward model <--> forward mode AD on recurrent machine. Essentially, have a recurrent machine learn the dynamics of the forward model running on reverse mode AD and must be something to do forward mode as always. Laanait, Nouamane Laanait, Nouamane https://code.ornl.gov/disMultiABM/stemdl/-/issues/7 gradient based SNN training in the complex plan 2019-12-15T00:18:26Z Laanait, Nouamane

gradient based SNN training in the complex plan

implement the diffeqs of SNN and see if one can avoid singularities by going to the complex plane and picking up principal values? implement the diffeqs of SNN and see if one can avoid singularities by going to the complex plane and picking up principal values? Laanait, Nouamane Laanait, Nouamane