-
Notifications
You must be signed in to change notification settings - Fork 1
quick_user_guide
The complete documentation can be found at the official EAR wiki. You can also find useful tutorials there.
- Using sbatch + srun: The job submission with EAR is totally automatic.
There are some ear options that can be requested at submission time (type
srun --help). If multiple steps are submitted in the same job, different flags for different steps can be used. The following example executes two steps. First one uses default flags and second one asks ear to report ear metrics in a set of csv files.ear_metrics/app_metricsis used as the root of filenames generated.
#!/bin/bash
#SBATCH -N 10
#SBATCH -e test.%j.err -o test.%j.out
#SBATCH --tasks-per-node=24 --cpus-per-task=1
#SBATCH --ear=on
module load mpi
mkdir ear_metrics
# run application with ear's default flags.
srun -n $SLURM_NTASKS application
# run application and store ear metrics in ear_metrics/app_metrics.*.csv
srun --ear-user-db=ear_metrics/app_metrics application
- Using Intel's mpirun: When running EAR with
mpirunrather thansrun, we have to specify the utilisation ofslurmas the bootstrap server.
Version 2019 and newer offers two environment variables for bootstrap server specification and arguments.
module load impi
export I_MPI_HYDRA_BOOTSTRAP=slurm
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--ear-user-db=ear_metrics/app_metrics"
mpiexec.hydra -n 64 application
- Using OpenMPI's mpirun: It is recommened to use
srunfor OpenMPI applications. Ifmpirunis used instead, EAR will report just a accounting metrics (DC Node Power and execution time of the job). If you want to enable EAR monitoring and optimization features, you must useerunbefore running your application binary. The tool accepts the same flags assbatch/sruncommands and a--programflag to specify the application you want to run. See the following example:
#!/bin/bash
#------------------------------------------------------
# Example SLURM job script with SBATCH requesting GPUs
#------------------------------------------------------
#SBATCH --job-name=gromacs
#SBATCH --account=bsc19
#SBATCH --qos=acc_bsccs
#SBATCH -o slurm_output.%j
#SBATCH -e slurm_error.%j
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --exclusive
#SBATCH --gres=gpu:4
#SBATCH --constraint=perfparanoid
module load nvidia-hpc-sdk
module load gromacs/2023.3
module load ear
mpirun erun --ear-verbose=1 --program="gmx_mpi mdrun -ntomp 8 -nb gpu -pme gpu -npme 1 -update gpu -bonded gpu -nsteps 100000 -resetstep 90000 -noconfout -dlb no -nstlist 300 -pin on -v -gpu_id 0123"In order to enable EAR monitoring and optimization features for non-MPI applications, it is required to run the application with the srun command.
For CUDA, OpenMP and MKL applications, the binary must have been linked with dynamic symbols (e.g., --cudart=shared).
Below there is an example enabling EAR with an OpenMP application.
#!/bin/bash
#SBATCH -N 1 -n 1 --cpus-per-task=64
#SBATCH --ear=on --ear-verbose=1
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -n $SLURM_NTASKS -c $OMP_NUM_THREADS ./bt.D.x
An example running a Python application:
#!/bin/bash
#SBATCH -N 1 -n 1 --cpus-per-task=64
#SBATCH --ear=on --ear-verbose=1
srun -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK python script.py
EAR can't detect MPI symbols when Python is used, so an environment variable is needed to specify which MPI flavour is being used.
module load ompi
export EAR_LOAD_MPI_VERSION="open mpi" # Qué valor debe tener para impi?
srun -n 64 --ear-user-db=ear_metrics/app_metrics python script.py
For other programming models or sequential apps not supported by default, EAR can be loaded by setting SLURM_EAR_LOADER_APPLICATION enviroment variable:
export EAR_LOADER_APPLICATION=/full/path/to/my_app
srun --ear-user-db=ear_metrics/app_metrics my_app
The eacct command shows accounting information stored in the EAR DB for jobs (and step) IDs.
You must first load the ear module.
Here we list the most useful command flags:
-
-j <job_id>[.step_id]: Specify the job (and optionally, the step) you want to retrieve information. -
-a <job_name>: Specify the application name that will be retrieved. -
-c <filename>: Store the output in csv format in <filename>. -
-l: Specify you want job data for each of the used computation nodes. -
-r: Request loop signatures instead of global application metrics. EAR loop reporting must be enabled throughEARL_REPORT_LOOPSenvironment variable. Just set it to a non-zero value. -
-s <YYYY-MM-DD>: Specify the minimum start time of the jobs that will be retrieved. -
-e <YYYY-MM-DD>: Specify the maximum end time of the jobs that will be retrieved.
The basic usage of eacct retrieves the last 20 applications (by default) of the user executing it.
The default behaviour shows data from each job-step, aggregating the values from each node in said job-step. If using SLURM as a job manager, a sb (sbatch)
job-step is created with the data from the entire execution.
A specific job may be specified with -j option:
- [user@host EAR]$ eacct --> Shows last 20 jobs (maximum) executed by the user.
- [user@host EAR]$ eacct -j 175966 --> Shows data for jobid = 175966. Metrics are averaged per job.stepid.
- [user@host EAR]$ eacct -j 175966.0 --> Shows data for jobid = 175966 stepid=0. Metrics are averaged per job.stepid.
- [user@host EAR]$ eacct -j 175966,175967,175968 --> Shows data for jobid = 175966, 175967, 175968 Metrics are averaged per job.stepid.
Eacct shows a pre-selected set of columns. Some flags sligthly modifies the set of columns reported:
- JOB-STEP: JobID and Step ID. sb is shown for the sbatch.
- USER: Username who executed the job.
- APP=APPLICATION: Job's name or executable name if job name is not provided.
- POLICY: Energy optimization policy name (MO = Monitoring).
- NODES: Number of nodes which ran the job.
- AVG/DEF/IMC(GHz): Average CPU frequency, default frequency and average uncore frequency. Includes all the nodes for the step. In KHz.
- TIME(s) : Step execution time, in seconds.
- POWER Average node power including all the nodes, in Watts.
- GBS : CPU Main memory bandwidth (GB/second). Hint for CPU/Memory bound classification.
- CPI : CPU Cycles per Instruction. Hint for CPU/Memory bound classification.
- ENERGY(J) : Accumulated node energy. Includes all the nodes. In Joules.
- GFLOPS/WATT : CPU GFlops per Watt. Hint for energy efficiency.
- IO(MBs) : IO (read and write) Mega Bytes per second.
- MPI% : Percentage of MPI time over the total execution time. It's the average including all the processes and nodes.
- GPU metrics
- G-POW (T/U) : Average GPU power. Accumulated per node and average of all the nodes.
- T= Total (GPU power consumed even if the process is not using them).
- U = GPUs used by the job.
- G-FREQ : Average GPU frequency. Per node and average of all the nodes.
- G-UTIL(G/MEM) : GPU utilization and GPU memory utilization.
- G-POW (T/U) : Average GPU power. Accumulated per node and average of all the nodes.
The following example shows how to submit a job with EAR monitoring enabled. It also shows how to enable loop signatures reporting and finally how to request the data.
#!/bin/bash
#SBATCH -J test
#SBATCH -p gpp
#SBATCH --qos=gp_debug
#SBATCH -A bsc19
#SBATCH -N 1
#SBATCH --ntasks=112
#SBATCH --cpus-per-task=1
#SBATCH --constraint=perfparanoid
#SBATCH --ear=on
#SBATCH --ear-user-db=metrics
module purge
module load bsc/1.0 oneapi/2023.2.0
export EARL_REPORT_LOOPS=1
srun ./bt-mz.D.impi
Using eacct to retrieve loop signatures:
[bsc019620@glogin1 bin]$ module load ear
[bsc019620@glogin1 bin]$ eacct -j 3180887 -r
JOB-STEP NODE ID DATE POWER(W) GBS/TPI CPI GFLOPS/W TIME(s) AVG_F/F IMC_F IO(MBS) MPI% G-POWER(T/U) G-FREQ G-UTIL(G/MEM)
3180887-0 gs02r3b66 09:08:12 825.6 156/17 0.277 0.619 1.013 2.52/2.0 1.81 0.0 4.2 0.0 / 0.0 0.00 0%/0%
3180887-0 gs02r3b66 09:08:24 969.7 157/17 0.277 0.527 1.240 2.51/2.0 1.81 0.0 3.6 0.0 / 0.0 0.00 0%/0%
3180887-0 gs02r3b66 09:08:47 906.7 157/17 0.277 0.563 1.127 2.51/2.0 1.81 0.0 3.8 0.0 / 0.0 0.00 0%/0%
3180887-0 gs02r3b66 09:09:09 909.1 157/17 0.277 0.561 1.126 2.51/2.0 1.81 0.0 3.7 0.0 / 0.0 0.00 0%/0%
Using eacct to retrieve job signature:
[bsc019620@glogin1 bin]$ eacct -j 3180887
JOB-STEP USER APPLICATION POLICY NODES AVG/DEF/IMC(GHz) TIME(s) POWER(W) GBS CPI ENERGY(J) GFLOPS/W IO(MBs) MPI% G-POW (T/U) G-FREQ G-UTIL(G/MEM)
3180887-sb bsc019620 test NP 1 2.61/2.00/--- 120.00 874.49 --- --- 104939 --- --- --- --- --- ---
3180887-0 bsc019620 test MO 1 2.52/2.00/1.81 97.72 913.51 157.12 0.28 89268 0.5578 0.0 3.7 0.00/--- --- ---
As through srun command both Intel MPI and OpenMPI implementations are compatible, below you can see a very similar example script which runs an OpenMPI application:
#!/bin/bash
#SBATCH -J test
#SBATCH -p gpp
#SBATCH --qos=gp_debug
#SBATCH -A bsc19
#SBATCH -N 1
#SBATCH --ntasks=112
#SBATCH --cpus-per-task=1
#SBATCH --constraint=perfparanoid
#SBATCH --ear=on
module purge
module load bsc/1.0 intel openmpi/4.1.5
srun ./bt-mz.D.ompiThis is a tool which lets you generate either static images or Paraver trace files directly from EAR data. If it is installed on a system with EAR full installed, the tool calls internally the eacct command to retrieve and build timelines for the requested job and step id.
$> module load ear ear-job-analytics
$> cpu_metrics="cpi gflops avg_cpufreq avg_imcfreq gbs dc_power"
$> gpu_metrics="gpu_power gpu_freq gpu_memfreq gpu_util gpu_memutil"
$> ear-job-analytics --format runtime -j 6043213 -s 0 -r -t palabos_8_nodes -o palabos_8.png -m $cpu_metrics $gpu_metricsAfter that, you will get the following image files:
$> ls *palabos_8*
runtime_cpi-palabos_8.png runtime_dc_power-palabos_8.png runtime_gbs-palabos_8.png runtime_gflops-palabos_8.png runtime_io_mbs-palabos_8.png runtime_pck_power-palabos_8.png runtime_perc_mpi-palabos_8.pngGraphs look like this:
@image latex ../images/runtime_cpi-palabos_8.png
@image latex ../images/runtime_io_mbs-palabos_8.png
Below there is an example on how to generate a Paraver trace for the of the same Job and Step:
$> ear-job-analytics --format ear2prv -j 6043213 -s 0 -o palabos_8
$> ls
palabos_8.pcf palabos_8.prv palabos_8.rowYou can download CPU metrics configuration file and GPU metrics configuration file.
You can read on the wiki how to visualize EAR metrics in Grafana Dashboards.
- Home
- User guide
- Tutorials
- Commands
- Environment variables
- Admin Guide
- Installation from source
- Architecture/Services
- High Availability support
- Configuration
- Classification strategies
- Learning phase
- Plug-ins
- Powercap
- Report plug-ins
- Database
- Supported systems
- EAR Data Center Monitoring
- CHANGELOG
- FAQs
- Known issues