Overhaul of job cancelling.
The LWR use to not make a distinction between cancelled jobs and complete jobs but with server side postprocessing it is more nessecary. Bugfixes including actually recording and use cancelled status in external job runners and persisting the the recorded cancel status in queued and unqueued Python job managers (in some conditions it would revert the cancelled and mark the job as complete). Catch and suppress exceptions related to cancelling jobs and fetching that status. Failure to ping the metadata store (disk) and determine if a job has been cancelled shouldn't be interpreted as a failure to cancel. Lots of refactoring and tests to support this.
Showing
- pulsar/managers/base/directory.py 20 additions, 0 deletionspulsar/managers/base/directory.py
- pulsar/managers/base/external.py 8 additions, 1 deletionpulsar/managers/base/external.py
- pulsar/managers/queued.py 1 addition, 0 deletionspulsar/managers/queued.py
- pulsar/managers/unqueued.py 5 additions, 9 deletionspulsar/managers/unqueued.py
- test/manager_drmaa_test.py 28 additions, 0 deletionstest/manager_drmaa_test.py
- test/manager_factory_test.py 2 additions, 2 deletionstest/manager_factory_test.py
- test/manager_queued_test.py 57 additions, 0 deletionstest/manager_queued_test.py
- test/manager_test.py 5 additions, 57 deletionstest/manager_test.py
- test/test_utils.py 73 additions, 1 deletiontest/test_utils.py
Loading
Please register or sign in to comment