FCDenseNet Benchmarks
A Reconstruction of EM data using FCDenseNet looks promising. As such FCDenseNet is a top candidate model to use in a GB run and/or SC'19 paper submission.
Carry out performance (single gpu for flops) and scaling (multiple nodes for communication) studies of FCDenseNet.
Input sizes from simulation will vary in size, some relevant sizes are [x,256,256], x=16x16, 32x32, etc...
Output size from simulation is [256,256].
Necessary code (to build FCDenseNet) is in stemdl/network
Benchmarks (might) require following code mods:
-
1. Modify stemdl/inputs/DatasetTFRecords to generate batch of inputs+outputs on the fly. -
2. Create dummy TFRecords (for relevant inputs+outputs) to assess impact of I/O.
Benchmarks can be quantified using: -
1. gpu timeline traces. -
2. analytical flops. -
3. Model's data processing throughput as a function of ranks.
Important Notes:
- Most of the necessary code is already implemented (in stemdl).
- Coordinate with Sean T. (in particular, Sean has a binary that forces direct convolutions --> 2x performance).