More consistent usage of entrypoint script in docs (136c252e) · Commits · ExaDigiT / sim-raps

README.md

+4 −3

Original line number	Diff line number	Diff line
		@@ -75,7 +75,7 @@ For MIT Supercloud
		raps run-parts -x mit_supercloud -w multitenant

		# Reinforcement learning test case
		python main.py train-rl --system mit_supercloud/part-cpu -f /opt/data/mit_supercloud/202201
		raps train-rl --system mit_supercloud/part-cpu -f /opt/data/mit_supercloud/202201

		For Lumi

		@@ -135,11 +135,12 @@ This will dump a .npz file with a randomized name, e.g. ac23db.npz. Let's rename
		There are three ways to modify replaying of telemetry data:

		1. `--arrival`. Changing the arrival time distribution - replay cases will default to `--arrival prescribed`, where the jobs will be submitted exactly as they were submitted on the physical machine. This can be changed to `--arrival poisson` to change when the jobs arrive, which is especially useful in cases where there may be gaps in time, e.g., when the system goes down for several days, or the system is is underutilized.
		python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --arrival poisson

		raps run -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --arrival poisson

		2. `--policy`. Changing the way the jobs are scheduled. The `--policy` flag will be set by default to `replay` in cases where a telemetry file is provided, in which case the jobs will be scheduled according to the start times provided. Changing the `--policy` to `fcfs` or `backfill` will use the internal scheduler, e.g.:

		python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --policy fcfs --backfill firstfit -t 12h
		raps run -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --policy fcfs --backfill firstfit -t 12h

		3. `--scale`. Changing the scale of each job in the telemetry data. The `--scale` flag will specify the maximum number of nodes for each job (generally set this to the max number of nodes of the smallest partition), and randomly select the number of nodes for each job from one to max nodes. This flag is useful when replaying telemetry from a larger system onto a smaller system.

experiments/mit-replay-24hrs.yaml

+1 −1

Original line number	Diff line number	Diff line
		# python main.py run-multi-part experiments/mit-replay-24hrs.yaml
		# raps run-multi-part experiments/mit-replay-24hrs.yaml
		partitions: ["mit_supercloud/part-cpu", "mit_supercloud/part-gpu"]
		replay:
		- /opt/data/mit_supercloud/202201

experiments/mit-synthetic.yaml

+1 −1

Original line number	Diff line number	Diff line
		# python main.py run-multi-part experiments/mit-synthetic.yaml
		# raps run-multi-part experiments/mit-synthetic.yaml
		partitions: ["mit_supercloud/part-cpu", "mit_supercloud/part-gpu"]
		workload: multitenant

raps/dataloaders/adastraMI250.py

+3 −3

Original line number	Diff line number	Diff line
		@@ -6,13 +6,13 @@


		# to simulate the dataset
		python main.py -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250
		raps run -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250

		# to replay with different scheduling policy
		python main.py -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250 --policy priority --backfill easy
		raps run -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250 --policy priority --backfill easy

		# to run a specific time range
		python main.py -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250 \
		raps run -f /path/to/AdastaJobsMI250_15days.parquet --system adastraMI250 \
		--start 2024-11-01T00:00:00Z --end 2024-11-02T00:00:00Z

		# to analyze dataset

raps/dataloaders/bluewaters.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -3,7 +3,7 @@ Blue Waters dataloader

		Example test case:

		python main.py -f /opt/data/bluewaters --start 20170328 --system bluewaters -net
		raps run -f /opt/data/bluewaters --start 20170328 --system bluewaters -net

		To download the necessary datasets: