Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • stemdl stemdl
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 3
    • Issues 3
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Jira
    • Jira
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • disMultiABM
  • stemdlstemdl
  • Issues
  • #4

Closed
Open
Created Jan 25, 2019 by Laanait, Nouamane@nl7Owner

New Tensorflow I/O pipeline with LMDB

Need new Tensorflow I/O pipeline.
Current pipeline stemdl/inputs.py/datasetTfrecord uses tfrecords to read images/labels and stagingArea for asynchronous get/put.
Large input sizes (i.e. images) expose intrinsic limitation of tfrecords. Best bandwidth achieved on Summit is 0.5 GB/sec for input sizes [1024,512,512] (CHW) float32. see lrn001/nl/dl/tf_io for all relevant scripts to benchmark I/O.
Current I/O bandwidths lead to (very) poor single gpu performance.
Per Sean, he experienced this with NERSC team, and moved away from tfrecords to (hdf5/numpy). LMDB should have much better reading performance than h5py/numpy (in pytorch @jqyin achieved I/O bandwiths of 2.5 GB/sec).
To do:

  1. Sublcass stemdl/inputs.py/DatasetTFRecords and override pure tfrecords methods (self.decode_image_label) and modify self.minibatch to use lmdb file (for I/O with lmdb/torch implementation see stemdl/io_utils_torch.py/ABFDataSet).
  2. Benchmark with single python process.
  3. Implement and Benchmark version with python multiprocessing.

Done means:
New TF I/O pipeline with bandwidths >= 1 GB/s.

Assignee
Assign to
Time tracking