epd created page: Monitoring Codes authored by Doak, Peter W.'s avatar Doak, Peter W.
## [Watch Server](https://code.ornl.gov/CNMS/CNMS_Computing_Resources/blob/master/utility/watch_server)
### Beta -- [Help](mailto:doakpw@ornl.gov), or even faster, @pdoak on Cades Condos slack.
This is a simple python server that watches logs and writes ticks to influxdb database. It finds the logs and gives them a name based on the directory the log is in. Each instance of the server can handle all your calculations with a particular code in a particular directory tree.
You can then watch your codes run on a [grafana dashboard](http://128.219.185.137:3000/dashboard/db/jobs-influx?from=1487230378812&to=1487359978813).
**Waste less cpu time and catch calculation problems quickly.**
### Basic Usage
clone the repo, I'm assuming you've done this in your home directory.
```shell-session
[you@or-condo-login02 ~]$ export CCRWS=~/CNMS_Computing_Resources/utility/watch_server
[you@or-condo-login02 ~]$ mkdir ~/watch; cd !!:1
[you@or-condo-login02 ~]$ cp $CCRWS/watch_whatever_code.yaml ./
[you@or-condo-login02 ~]$ vim|emacs|nano watch_whatever_code.yaml
```
Edit the rootdir, username and log_file if needed.
The rest should be properly set already.
```shell-session
[you@or-condo-login02 ~]$ nohup python $CCRWS/watch_server.py ./watch_whatever_code.yaml 2>&1 whatever_code.out &
```
Now when you the server sees a log file appear anywhere in the root_dir tree it will begin to read it and write real-time data to the influxdb server.
### Grafana Dash Board
[Go here](http://128.219.185.137:3000/)
Eventually this will be integrated with UCAMS, but for the beta just sign up for an account.
You should be able to see a few sample dashboards on the dashboard list. They have a drop-down userid selector at the upper left. Once you've written some data from your calculations, your name should show up. Select it and you should see ticks from your jobs.
If you want to modify the dashboard you should be able to click the gear icon and save as.
### Beta Note!
If you can't get this all working smoothly don't worry, you're one of the first to try this. It's been cobbled together in my (Peter Doak) spare time but I think its worth sharing. I will provide you help getting set up and we will improve the docs and system.
### Yaml configuration file
```yaml
log_name: out #name of your outfile for this code
dropoff: 20 #minutes since last status change to no longer find log
host: 128.219.185.137 #influxdb host
port: 8086 #influxdb port
root_dir: /your/calculation/root/dir #top of your calculation tree
user_name: your id #your user id
job_prefix: 'espresso' #generally the code
job_suffix: 'condo' #generally the server
influx: True #we're using influxdb here
start: 'PWSCF.*starts' #what to match to now a calculation is starting
init: #measurements that want a first tick
'tcpu': 0 #startup tick value for tcpu
header: #these are header matches, we expect them once per run
'Parallel version (MPI), running on': [[6, 'nproc']]
'K-points.*npool': [[5, 'kpar']]
'number of k points': [[5, 'kpoints']]
'number of atoms/cell': [[5, 'natoms']]
'number of Kohn-Sham states': [[5, 'nstates']]
parse: #these are recurring measurments
# regex: column, tick_name
'total cpu time spent': [[9, 'tcpu']]
'total energy': [[4, 'toten']]
'estimated': [[5, 'eacc']]
'Total force': [[4, 'tforce']]
finish: #this is matched when the job finishes neatly
- 'JOB DONE'
idle_count: 30 #how many times a log can be idle before it is dropped
find_sleep: 60 #time between idle checks.
```
The comments provide sufficient explanation except for the actual parsing items.
The format works like this
```
'regex of line with data': [ [ column of data, 'measurement name'] , ... ]
```
So a bit in the spirit of awk.