FCDenseNet Benchmarks

A Reconstruction of EM data using FCDenseNet looks promising. As such FCDenseNet is a top candidate model to use in a GB run and/or SC'19 paper submission. Carry out performance (single gpu for flops) and scaling (multiple nodes for communication) studies of FCDenseNet.
Input sizes from simulation will vary in size, some relevant sizes are [x,256,256], x=16x16, 32x32, etc...
Output size from simulation is [256,256].
Necessary code (to build FCDenseNet) is in stemdl/network Benchmarks (might) require following code mods:

  • 1. Modify stemdl/inputs/DatasetTFRecords to generate batch of inputs+outputs on the fly.
  • 2. Create dummy TFRecords (for relevant inputs+outputs) to assess impact of I/O.
    Benchmarks can be quantified using:
  • 1. gpu timeline traces.
  • 2. analytical flops.
  • 3. Model's data processing throughput as a function of ranks.

Important Notes:

  1. Most of the necessary code is already implemented (in stemdl).
  2. Coordinate with Sean T. (in particular, Sean has a binary that forces direct convolutions --> 2x performance).
Edited by Laanait, Nouamane