HPSS Pooling
Current implementation makes a new HPSS request for every file opened. Two problems with this:
- Race condition when two different users open a non-cached file. How does Globus handle this? Are the two globus processes cooperative?
- Sending files in batches allows HPSS to optimize tape retrieval, but only within the batch
We could spin up a separate thread to handle retrievals. This worker thread would pull all files in the queue into a batch (set at timeout on the .get() of maybe .5 sec?) and submit them all at once. This would balance batch size and wait time, but globus' progress reporting is poor. Two different users's open requests could get batched together, but there's no way to tell that all (or any!) of user 1's files have finished until the whole batch completes. Worst case is user 1 opens 1 uncached file, and user 2 opens 100 uncached files. User 1 has to wait for 101 files to be cached to start using their 1 file. This is compounded by contention for the small number of HPSS connections available