Large Process toward live system simulation.

Commit breakdown:

  1. First parts of Frontier live.

    • Added live mode and frontier load_live_data dataloader (dummy)
  2. Added load from live system in telemetry.

    • changes to telemetry.py
  3. Large Process toward live system simulation.

    • Added actual timestep_start for system simulation
    • refactored wall_time to be expected_run_time
    • added current_runtime and clarifications on other job times. (in jobs.py and fronter dataloader)

    TODO: Systems other than frontier do not have this fix yet!

    • Added load_live_data, currently still with a dummy json dump obtained from pyslurm.

      TODO replace with call and implement in standalone telemetry.py

    • refactored state and _state to current_state and end_state to show current and expected state (for future tests, and due to accomodate difference in live, replay and reschedule setups.

    • Added TIMEOUT handling and killed job list.

      TODO: Add to stats.py

    • updated prepare_system_state to work with live simulation and refactored information tracked only at engine state where redundancy was detected.

    • Refactored resmgr and schedulers to work with time_limit (and expected_run_time instead of end_time, where appropriate).

      This prevents schedulers to use time still allocated for a job before it has ended.

    • Handled potentially empty end_time (This is a valid state, since we do not generally know the end_time before the job has actually ended)

    • Added Y2K Bug to UI (Assuming that any simulation time that runs with unixtime > January 2000 is actually a unixtime and not an attempt to simulate a system for 30 years, as their lifetime is seldomly that long... (Improved UI)

    • Updated workload to work with new times (wall_time expected_run_time refactor)

      Note that we now have: current_run_time expected_run_time, and time_limit. These are different and needed for replay rescheduling and live-reschedule.

  4. Fixes to enable telemetry loading from live

    • added save snapshow when running from live
    • added default time to simulate when running in live mode
    • added replay flag to tick to use bool not :None | file
    • Some additional fixes for wall_time in plotting.py
    • Telemetry fixed import of timestep_start and end (and args)
    • Enabled run_telemetry standalone for live mode.
  5. Fixed cooling for changed current_timestep name

Merge request reports

Loading