A tool to automatically read and visualise runtime data provided by the EAR software. ear-job-visualizer is a cli program written in Python which lets you plot the EAR data given by some of its commands or by using some report plug-in offered by the EAR Library (EARL). The main visualisation target is to show runtime metrics collected by the EAR Library in a timeline graph.
By now this tool supports two kind of output formats:
- Directly generate images showing job runtime information.
- Generate a trace file to be read by Paraver, a tool to visualise and manage trace data maintaned by the Barcelona Supercomputing Center's Tools team.
For more information, read about eacct or this guide which shows you how to run jobs with EAR and how to obtain runtime data. You can find here more information about how Paraver works.
- Generate static images showing runtime metrics of your job monitored by EARL.
- Generate Paraver traces to visualize runtime metrics within Paraver tool or any other tool of the BSC's Tools teams.
This is a non-exhaustive list of the package dependencies. There is a dedicated section with specific instructions for installing the tool.
- pandas[performance,plot,output-formatting]
- importlib_resources
- rich
- ear_analytics_core (Versions must match)
By default, the tool calls internally the EAR account command (i.e., eacct) with proper options in order to get the corresponding data to be sent to the tool's functionalities.
Be sure you have the the
eacctcommand on your path, and also check whetherEAR_ETCenvironment variable is set properly. By loadingearmodule you should have all the needed stuff ready.
If you have some trouble, ask your system administrator if there is some problem with the EAR Database.
You can also provide directly input files if the eacct command is unable, read below.
We are working on provide different methods for installing the tool in order to fit any use cases. Subsections will be added for each one.
Note You must install a version which matches the EAR version you are getting data from. For example, for any EAR v5.x.y you are using, you can use any of the
ear-job-visualizerv5.x'.y'. If you decide to install the tool by cloning the repository, you may need to switch the branch.mainbranch points to the latest release. Finally, there is avXbranch for each EAR major version the tool is compatible with. These branches are stable.
This repository contains all recipes to build and install the package.
You need build and setuptools packages to properly build and install it.
You can also use the PYTHONUSERBASE environment variable to modify the target directory for the installation.
- Clone this repository or download the source code from the release which matches your EAR version.
- Create a virtual environment and activate it:
python -m venv my_env && source my-env/bin/activate. - Update pip, and install required modules for packaging:
pip install -U pip && pip install build setuptools wheel. - Build the package:
python -m build. - Install it:
pip install ..
You can change the destination path by exporting the variable
PYTHONUSERBASE. Tool's developers may want to usepip install -e .to install the package in editable mode, so there is no need to reinstall every time you want to test a new feature.
Then, you can type ear-job-visualizer and you should see the following:
usage: ear-job-visualizer [-h] [--version] [-c CONFIG_FILE]
(--format {runtime,ear2prv} | --print-config | --avail-metrics)
[--loops-file LOOPS_FILE] [--apps-file APPS_FILE]
[-j JOB_ID] [-s STEP_ID] [-o OUTPUT] [-k] [-t TITLE]
[-r] [-m metric [metric ...]]
ear-job-visualizer: error: one of the arguments --format --print-config --avail-metrics is required
You can install the tool to be available to other users in multiple ways, and maybe you know a better approach for doing so or which fits much better to your use case, but here there is explained a way we found useful to fit on systems where we put this tool in production.
- Export the
PYTHONUSERBASEenvironment variable to modify the target directory for the installation. - Prepend the path to
site-packagesdirectory where you have installed the tool toPYTHONPATH. - Prepend the path to
bindirectory where you have installed the tool toPATH.
For example, if you have installed the tool in a virtual environment located in a directory where other users have read and execute permissions, you may want to provide users a module file which prepends <prefix>/lib/python<version>/site-packages to PYTHONPATH variable and <prefix>/bin to PATH1. You can use the python script create_module.py to generate the module file.
# An example module file for Lmod
whatis("Enables the usage of ear-job-visualizer, a tool for visualizing performance metrics collected by EAR.")
-- Add here the required python module you used for building the package.
-- depends_on("")
prepend_path("PYTHONPATH", "virtualenv/install/dir/lib/python<version>/site-packages")
prepend_path("PATH", "virtualenv/install/dir/bin")Save this file as eas-tools.lua, typically in the EAR/installation/path/etc/module, and load it with the command module load eas-tools.
You must choose one of the three main required options.
The one you may use most of times is --format, but the order followed in this document is useful for new users to understand how the tool works.
Pretty prints the configuration being used by the tool.
You can take the printed configuration as an example for making yours and use it later through --config-file option.
The usage of this flag is very simple:
ear-job-visualizer --print-config > my_config.jsonShows metrics names supported by tool. These supported metrics are taken from the configuration file, so you can view the default supported metrics with:
ear-job-visualizer --avail-metricsYou can also check your own configuration file:
ear-job-visualizer --avail-metrics -c my_config.jsonThis option is in fact used to request for plotting (or converting) data.
Choices for this option are either runtime or ear2prv, and each one enables each of the tool's features.
Read below sections for a detailed description of each one.
The runtime option is the one used to generate static images, while ear2prv refers the tool's interface to output data following the Paraver Trace Format.
Both format options share a subset of arguments.
The --job-id flag is mandatory to be specified.
It is used by the tool to filter input data in the case it contains more than one Job ID, as it currently only supports single job visualisation.
Moreover, you can set the --step-id flag to filter also the Step ID, which is mandatory for --format runtime option and optional for --format ear2prv, since the latter supports multiple step data in the input.
By default, the tool will internally call the eacct command and will store the data into temporary files.
Those files will be used by the tool and are removed at the end.
If you want to prevent the removal of that files, you can add the --keep-csv flag.
If you know which eacct invokations are required to visualise the data, you can use --loops-file and --apps-file options to specify where the tool can find the data to be filtered and used.
Both of them are required if you are going to use the tool escaping the internal use of the eacct command.
The former is obtained through eacct -j <jobid>[.stepid] -r -c <loops_file> and the latter through eacct -j <jobid>[.stepid] -l -c <apps_file>.
You can alternatively obtain both files by using one of the EAR report plug-ins distributed with EAR. This option is useful when you already have data for multiple jobs and/or steps together and you want to work on it in several ways because naturally it's more fast to work directly on a file than invoking a command to make a query to a Database, storing the output on a file, and then read such file. This option is also useful since it lets you work on a host where you can't access EAR Database nor EAR is installed.
Mentioned --loops-file and --apps-file options accept also a path to a directory instead of a filename.
This is useful because when you request EAR to generate csv files through the --ear-user-db=<csv-filename> flag, one csv file for each compute node is created.
Therefore, for each compute node your application ran on, files <csv-filename>_<nodename>_loops.csv and <csv-filename>_<nodename>_apps.csv are created.
Consequently, if you want to visualize runtime metrics for your specific multi-node application, you may need to move all loops and apps data into a single directory, respectively, and pass such directories to the tool's loops-file and apps-file.
mkdir apps_dir && mv *_apps.csv apps_dir
mkdir loops_dir && mv *_loops.csv loops_dir
ear-job-visualizer --format <format-option> --job-id <job-id> --loops-file loops_dir --apps_file apps_dir <format-specific-options>Generate a heatmap-based graph for each metric specified by --metrics argument (i.e., space separated list of metric names).
Note that the accepted metrics are specified in the configuration file and you can request the list trough the --avail-metrics flag.
This option just supports plotting data for a single Job-Step ID, thus both
--job-idand--step-idflags are required.
The resulting figure (for each requested metric) will be a timeline where for each node your application had used you will see a heatmap showing an intuitive visualisation about the value of the metric during application execution. All nodes visualised share the same timeline, which makes this command useful to check the application behaviour over all of them. Below there is an example showing how to generate images for a two-node MPI application, the I/O rate, the GFLOPS and the percentage of time spent in MPI calls along the execution time.
ear-job-visualizer --format runtime --job-id <> --step-id <> -m io_mbs gflops perc_mpiUse
--avail-metricsflag to view tool's supported metrics and the name you must use to retrieve them.
The above command line generates the following figures:
If you request GPU metrics, the graph will show you per-GPU data. For each requested GPU metric the tool filters those GPUs which have a constant zero value along the execution time.
ear-job-visualizer --format runtime --job-id 69478 --step-id 0 --loops-file /examples/runtime_format/69478_loops.csv --apps-file /examples/runtime_format/69478_apps.csv -m gpu_util gpu_power -o 69478.0.pngThe above command line generates the following figures:
You can use the EAR dcgmi.so report plug-in to generate CSV files containing extra GPU metrics taken from either2:
- The NVIDIA® Data Center GPU Manager (DCGM).
- NVIDIA Management Library (NVML) GPM metrics.
You can use later those csv files directly by invoking the tool with both --loops-file and --apps-file flags as well.
By default, the colormap of the data is computed from the data value range found in the source, i.e., a colormap is built taken the minimum and maximum values of the requested metric along the runtime across all involed nodes/GPUs.
However, you can change this behaviour by passing the --manual-range flag. Thus, the tool will use the range for the requested metric specified at the Configuration file.
Convert job runtime data gathered from EARL to Paraver Trace Format.
In this case, all metrics found in the input data are reported to the trace file.
Moreover, you can have in the same trace all steps and applications (e.g., a workflow) of your job, so just the --job-id flag is required.
Keep in mind that the trace file generated by this tool have the following mapping between EAR data and the Paraver Trace Format:
- As EAR data is reported at the node-level, EAR node data can be visualized at the Paraver task-level (Thread 1 is used).
- The tool uses the thread-level to put the GPU data.
You can find two examples of Paraver Config Files to easily start working with the output data generated by this option.
Check the config.json file.
For any question and suggestion, contact with support@eas4dc.com. You can also open an issue in this repository.
Footnotes
-
<prefix>is the location where you have installed the tool, e.g., the virtual environment installation directory, the value of$PYTHONUSERBASEenvironment variable in the case you use it. ↩ -
The source of these metrics is transparent from the user point of view. In fact is EAR who takes data from the available source. Metrics are the same regardless the interface used. ↩

