New Tensorflow I/O pipeline with LMDB

Need new Tensorflow I/O pipeline.
Current pipeline stemdl/inputs.py/datasetTfrecord uses tfrecords to read images/labels and stagingArea for asynchronous get/put.
Large input sizes (i.e. images) expose intrinsic limitation of tfrecords. Best bandwidth achieved on Summit is 0.5 GB/sec for input sizes [1024,512,512] (CHW) float32. see lrn001/nl/dl/tf_io for all relevant scripts to benchmark I/O.
Current I/O bandwidths lead to (very) poor single gpu performance.
Per Sean, he experienced this with NERSC team, and moved away from tfrecords to (hdf5/numpy). LMDB should have much better reading performance than h5py/numpy (in pytorch @jqyin achieved I/O bandwiths of 2.5 GB/sec).
To do:

  1. Sublcass stemdl/inputs.py/DatasetTFRecords and override pure tfrecords methods (self.decode_image_label) and modify self.minibatch to use lmdb file (for I/O with lmdb/torch implementation see stemdl/io_utils_torch.py/ABFDataSet).
  2. Benchmark with single python process.
  3. Implement and Benchmark version with python multiprocessing.

Done means:
New TF I/O pipeline with bandwidths >= 1 GB/s.