From 6f95ebe115d4a3879f19b7e847d2a1a564c8fa2e Mon Sep 17 00:00:00 2001 From: Neil Vaytet <neil.vaytet@esss.se> Date: Tue, 18 Sep 2018 11:58:49 +0200 Subject: [PATCH] Refs #0 : Corrected dev-docs for SystemTests with new scheduler --- .../lib/systemtests/stresstesting.py | 4 +- dev-docs/source/SystemTests.rst | 49 ++++++++++++++----- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/Testing/SystemTests/lib/systemtests/stresstesting.py b/Testing/SystemTests/lib/systemtests/stresstesting.py index 62d91733738..20f59221078 100644 --- a/Testing/SystemTests/lib/systemtests/stresstesting.py +++ b/Testing/SystemTests/lib/systemtests/stresstesting.py @@ -1156,11 +1156,11 @@ def envAsString(): ######################################################################### # Function to keep a pool of threads active in a loop to run the tests. -# Each threads starts a loop and gathers a first test module from the +# Each thread starts a loop and gathers a first test module from the # master test list which is stored in the tests_dict shared dictionary, # starting with the number in the module list equal to the process id. # -# Each process then checks if all the data files requird by the current +# Each process then checks if all the data files required by the current # test module are available (i.e. have not been locked by another # thread). If all files are unlocked, the thread proceeds with that test # module. If not, it goes further down the list until it finds a module diff --git a/dev-docs/source/SystemTests.rst b/dev-docs/source/SystemTests.rst index dc03dff5214..7a2c8cf3dac 100644 --- a/dev-docs/source/SystemTests.rst +++ b/dev-docs/source/SystemTests.rst @@ -228,19 +228,42 @@ would run the tests on 8 cores. Some tests write or delete in the same directories, using the same file names, which causes issues when running in parallel. To resolve this, -the tests are grouped in lists where all modules starting with the -same 4 letters are given to one core. This worsens the load balance -between cores (with 8 cores, core 1 performs 93 tests while core 8 only -has 44). This is not ideal but allows the suite to complete without -failures. The runtime using 8 cores still goes down from 2h to 30 min. - -This also means that in the case of running a subset of tests with the -``-R`` option, if the number of groups created from this is smaller -than the number of cores being used, some cores will have no tests to -run. Using the ``-j`` option is only really advantageous when running -a large list of tests. It does not bring much speedup up for a small -subset of tests, as these are likely to be put inside the same group -and run on the same core. +a global list of test modules (= different python files in the +``Testing/SystemTests/tests/analysis`` directory) is first created. +Now we scan each test module line by line and list all the data files +that are used by that module. The possible ways files are being +specified are: +1. if the extensions ``.nxs``, ``.raw`` or ``.RAW`` are present +2. if there is a sequence of at least 4 digits inside a string +In case number 2, we have to search for strings starting with 4 digits, +i.e. "0123, or strings ending with 4 digits 0123". +This might over-count, meaning some sequences of 4 digits might not be +used for a file name specification, but it does not matter if it gets +identified as a filename as the probability of the same sequence being +present in another python file is small, and it would therefore not lock +any other tests. A dict is created with an entry for each module name +that contains the list of files that this module requires. +An accompanying dict with an entry for each data file stores a lock +status for that particular datafile. + +Finally, a scheduler spawns ``N`` threads who each start a loop and +gather a first test module from the master test list which is stored in +a shared dictionary, starting with the number in the module list equal +to the process id. + +Each process then checks if all the data files required by the current +test module are available (i.e. have not been locked by another +thread). If all files are unlocked, the thread locks all these files +and proceeds with that test module. If not, it goes further down the +list until it finds a module whose files are all available. + +Once it has completed the work in the current module, it unlocks the +data files and checks if the number of modules that remains to be +executed is greater than 0. If there is some work left to do, the +thread finds the next module that still has not been executed +(searches through the tests_lock array and finds the next element +that has a 0 value). This aims to have all threads end calculation +approximately at the same time. Reducing the size of console output ----------------------------------- -- GitLab