Changes

Joubert, Wayne · 91398125
--- a/harness-improvements.md
+++ b/harness-improvements.md
@@ -28,5 +28,24 @@ Let's put together a list of things we'd like to have in the harness and maybe t
 Possible additional tasks for harness - WJ
 ==========================================
+* LSF integration
+* convert to python 3 (crest has python3.4, titan has python3.4.3 loadable through module)
+* get threaded version running robustly
+* code checking with pylint
+* adopt consistent coding style, e.g. google python coding style - this is not absolutely necessary, but if this is something we could agree to, it would improve the quality of the product
+* in-code documentation - docstrings etc.
+* user manual, getting started, faq - to whatever level of detail we think is needed
+* integration of documentation generation tool
+* assertions or other fine-grained error checking throughout code
+* unit testing
+* CI testing
+* notification of failed job, e.g., email, IM, text message (possibly outside harness proper)
+* nagios interface to notify operators of critical failure condition (possibly outside harness proper)
+* reporting of global status of an acceptance campaign, e.g., splunk
+* separate class to handle scheduler choice (LSF, PBS, etc.) to make more easy to extend
+* way to define many tests with minimal or no redundancy in what the user specifies
+* careful examination of robustness under instabilities, e.g., flaky file system, to try to make robust under any conditions, to within reason
+* give the harness a "trusted" file system location (not Lustre) that will be reliable with very high degree of confidence, for storing records, logs safely even in case of transient errors/failure
+* some way to navigate files relevant to a run more easily -- e.g., in the work dir put symlinks to the matching build dir, status file, run archive file/dir, scripts dir, source dir, job id file, etc. - cross-links to make it easy to move around.  usability design of this - how can we do our failure diagnosis with the minimal number of clicks/keystrokes?
+* decided-on policy for what constitutes "failure" of different kinds, as a basis for consistent reporting