-
Notifications
You must be signed in to change notification settings - Fork 1
Admin guide
[[TOC]]
EAR is composed of five main components:
- Node Manager (EARD): It is a Linux service which provides the basic node power monitoring and job accounting. It also offers an API to be used for third-parties (e.g., other EAR components) to to make priviledged operations. It must have root access to the node (usually all compute nodes) where it will be running.
- Database Manager (EARDBD): A Linux service (it normally runs in a service node) which caches data to be stored in a database reducing the number of queries. We currently support MariaDB and PostgresSQL. This compoment is not needed to be enabled/used if don't use such database services to report EAR data.
- Global Manager (EARGM): A Linux service (it normally runs in a service node) which provides cluster-level support (e.g., powercap). It needs access to all nodes where a Node Manager is runningi the cluster.
- EAR Library (EARL): A Job Manager (distributed as a shared object) which provides job/application -level monitoring and optimization.
- Scheduler plug-in: A SLURM SPANK plug-in and a PBS Pro Hook which provide support for using EAR job accounting and loading EARL transparently for users.
For a more detailed information about EAR components, visit the Architecture page.
This section provides summary of needed steps for compiling and installing EAR. The complete guide is left into [another section](Installation from source), but it is recommended to read first this section since it contains useful information about how and what gets compiled and installed.
Check out first whether your system satisfies all requirements, then check that you have Autoconf version 2.69 or later. You can then bootstrap the build system:
autoreconf -iAs commented in the overview, the EAR Library might be loaded along with MPI applications thanks to the EAR Loader library. The latter detects the application symbols at runtime and loads the right Library. Therefore, you should compile at least two versions of the EAR Library:
- One for MPI applications (using one of the MPI implementations supported.
- One for non-MPI applications.
This is an example to configure EAR to be compiled for both versions:
# Configure EAR to compile the non-MPI version of the EAR Library
./configure --disable-mpi \
MPICC=mpicc MAKE_NAME=nompi
# Configure EAR to compile the MPI version of the Library
./configure MPICC=mpicc MPICC_FLAGS="-O2 -g" MAKE_NAME=impiThe above example assumes your MPI Library is Intel MPI. If you want to compile EARL for another MPI flavour check out this section.
EAR currently does not support GNU make parallel builds, so the above example must be run in the source code root directory.
For the same reason, the configure script support a variable called MAKE_NAME, so it generates a Makefile called Makefile.<MAKE_NAME variable value>.
Therefore you can call make program targeting each configuration Makefile generated program targeting each configuration Makefile generated.
The flag --disable-mpi is used for configuring the non-MPI version of the EAR Library.
Note that even when configuring for this use case,
MPICCvariable is also set. This is because the EAR Loader still needs MPI headers for checking whether the application being running is MPI, andconfigurefinds out these by checking this variable.
After completing the previous steps, you can compile and install EAR by targeting each of the generated Makefiles. the following example takes the Makefile suffixes used in the previous one:
# Compile and install EAR. The EAR Library version installed
# will be for supporting non-MPI applications.
make -f Makefile.nompi
make -f Makefile.nompi install
# sysconfdir installation needs another target
make -f Makefile.nompi etc.install
make -f Makefile.nompi doc.install
# Compile and install just the
# MPI version of the EAR Library
make -f Makefile.impi full
make -f Makefile.impi earl.installIn the above example, some non-standard targets are used.
etc.install target is needed for installing all configuration, module and service files to be used later when configuring EAR.
The full target is the equivalent of calling first clean and then all targets.
Finally, earl.install is used for installing the EAR Library, since we are just compiling again because we want another version of the Library installed along with the previous one.
EAR Makefiles include a specific target for each component, supporting full or partial updates:
| Target | Description |
|---|---|
install |
Reinstalls all the files except etc and doc. |
earl.install |
Reinstalls only the EARL. |
eard.install |
Reinstalls only the EARD. |
earplug.install |
Reinstalls only the EAR SLURM plugin. |
eardbd.install |
Reinstalls only the EARDBD. |
eargmd.install |
Reinstalls only the EARGMD. |
reports.install |
Reinstalls only report plugins. |
Here is an example of a bash script summarizing the information provided until now, compiling and installing EAR with two versions of the Library: one supporting Intel MPI applications and other one supporting any non-MPI application:
#!/bin/bash
# This script bootstraps, configures, compiles and installs
# EAR with two versions of the Library: one supporting Intel MPI applications
# and other one supporting any non-MPI application
#
# Requirements:
# - GNU Autoconf
# - GNU make
# - A modern C compiler
# - Intel MPI compiler and Library
EAR_INSTALL_PATH= # Set the root location of your installation
EAR_TMP= # Set the location of temporary directories and files
EAR_ETC= # Set the location of configuration and services files.
my_CFLAGS="-O2 -g"
# Bootstrap the configure script
autoreconf -i
# Configure EAR to compile the non-MPI version of the EAR Library
./configure --disable-mpi --prefix=$EAR_INSTALL_PATH \
MPICC=mpicc CC=gcc CC_FLAGS="$my_CFLAGS" \
EAR_TMP=$EAR_TMP EAR_ETC=$EAR_ETC \
MAKE_NAME=nompi
# Compile and install EAR. The EAR Library version installed
# will be for supporting non-MPI applications.
make -f Makefile.nompi
make -f Makefile.nompi install
# sysconfdir installation needs another target
make -f Makefile.nompi etc.install
make -f Makefile.nompi doc.install
# Configure EAR to compile the MPI version of the Library.
./configure --prefix=$EAR_INSTALL_PATH \
MPICC=mpicc MPICC_FLAGS="$my_CFLAGS" \
CC=gcc CC_FLAGS="$my_CFLAGS" \
EAR_TMP=$EAR_TMP EAR_ETC=$EAR_ETC \
MAKE_NAME=impi
# Compile and install just the
# MPI version of the EAR Library
make -f Makefile.impi full
make -f Makefile.impi earl.installAfter compiling and installing following the previous step, you should have the following directories under configure's --prefix flag used path:
-
bin: Including commands and tools. -
sbin: Includes EAR services binaries. -
etc: Includes templates and examples for EAR service files, theear.conffile, the EAR module and so. -
lib: Includes all libraries and plugins. include-
man: Man pages.
Inside lib directory, apart from plug-ins, you should see at least three files.
-
libearld.so: This is the EAR Loader. -
libear.so: This is the EAR Library compiled with Intel MPI symbols. See the next section if you need support for other MPI implementations. -
libear.gen.so: This is the EAR Library compiled without MPI symbols. The.genextension is added automatically when setting--disable-mpiflag.
Many systems have different MPI implementations installed, so users can choose which one fits better for their applications. Even all of them provide the same interface, each one has some specific symbols not specified in the standard. Therefore you need to install an EAR Library version for each MPI flavor you want to support.
In order to help the EAR Loader to load the proper Library version, coliving libraries must be named different.
This is accomplished by providing MPI_VERSION variable to configure.
This variable sets an extension of the libear.so shared object compiled, so when the EAR Loader detects the MPI version of the application, it can easily load the proper Library.
You need to set a specific value to variable value depending on the MPI implementation you are going to compile following this table:
| Implementation | MPI_VERSION value | EARL Name |
|---|---|---|
| Intel MPI | not required | libear.so (default) |
| MVAPICH | not required | libear.so (default) |
| OpenMPI | ompi | libear.ompi.so |
| Fujitsu MPI | fujitsu | libear.fujitsu.so |
| Cray MPI | cray | libear.cray.so |
Note that in the example used until now this variable was not used. This is because for this MPI version the EAR Loader does not find for an extension, and it is the continuation of the first EARL design and it was not changed.
So, if you would like to add to your previous EAR installation the support for, let's say, OpenMPI, you should type the following:
# Configure EAR to compile Library supporting OpenMPI applications
# Note: mpicc must point to an OpenMPI installation
./configure MPICC=mpicc MPICC_FLAGS="-O2 -g" MAKE_NAME=openmpi MPI_VERSION=ompi
make -f Makefile.openmpi full
# The below line assumes you already have installed all other components,
# i.e., `make -f Makefile.<extension> install`.
make -f Makefile.openmpi earl.installThis is an example of a bash script which summarizes the configuration, compilation and installation of EAR providing support for multiple MPI implementations:
#!/bin/bash
# This script bootstraps, configures, compiles and installs
# EAR with two versions of the Library: one supporting Intel MPI applications
# and other one supporting any non-MPI application
#
# Requirements:
# - GNU Autoconf
# - GNU make
# - A modern C compiler
# - Intel MPI compiler and Library
EAR_INSTALL_PATH= # Set the root location of your installation
EAR_TMP= # Set the location of temporary directories and files
EAR_ETC= # Set the location of configuration and services files.
my_CFLAGS="-O2 -g"
# Bootstrap the configure script
autoreconf -i
# Replace with an Intel MPI module
module load intel-mpi-module
# Configure EAR to compile the non-MPI version of the EAR Library
./configure --disable-mpi --prefix=$EAR_INSTALL_PATH \
MPICC=mpicc CC=gcc CC_FLAGS="$my_CFLAGS" \
EAR_TMP=$EAR_TMP EAR_ETC=$EAR_ETC \
MAKE_NAME=nompi
# Compile and install EAR. The EAR Library version installed
# will be for supporting non-MPI applications.
make -f Makefile.nompi
make -f Makefile.nompi install
# sysconfdir installation needs another target
make -f Makefile.nompi etc.install
make -f Makefile.nompi doc.install
# Configure EAR to compile the MPI version of the Library.
./configure --prefix=$EAR_INSTALL_PATH \
MPICC=mpicc MPICC_FLAGS="$my_CFLAGS" \
CC=gcc CC_FLAGS="$my_CFLAGS" \
EAR_TMP=$EAR_TMP EAR_ETC=$EAR_ETC \
MAKE_NAME=impi
# Compile and install just the
# MPI version of the EAR Library
make -f Makefile.impi full
make -f Makefile.impi earl.install
# Configure EAR to compile Library supporting OpenMPI applications
# Note: mpicc must point to an OpenMPI installation
module unload intel-mpi-module
module load openmpi-module
./configure --prefix=$EAR_INSTALL_PATH \
MPICC=mpicc MPICC_FLAGS="$my_CFLAGS" \
CC=gcc CC_FLAGS="$my_CFLAGS" \
EAR_TMP=$EAR_TMP EAR_ETC=$EAR_ETC \
MAKE_NAME=openmpi MPI_VERSION=ompi
make -f Makefile.openmpi full
make -f Makefile.openmpi earl.installPrepare the configuration
Either installing from sources or rpm, EAR installs a template for ear.conf file in $EAR_ETC/ear/ear.conf.template and $EAR_ETC/ear/ear.conf.full.template.
The full version includes all fields. Copy only one as $EAR_ETC/ear/ear.conf and update
with the desired configuration. Go to the configuration section to see how to do it.
The ear.conf is used by all the services. It is recommended to have in a shared folder to simplify the changes in the configuration.
EAR module
Install and load EAR module to enable commands. It can be found at $EAR_ETC/module.
You can add ear module whan it is not in standard path by doing module use $EAR_ETC/module and then
module load ear.
EAR Database
Create EAR database with edb_create, installed at $EAR_INSTALL_PATH/sbin.
The edb_create -p command will ask you for the DB root password.
If you get any problem here, check first whether the node where you are running the
command can connect to the DB server. In case problems persists, execute edb_create -o to report the specific SQL
queries generated. In case of trouble, contact with ear-support@bsc.es or open in issue.
Energy models
EAR uses a power and performance model based on systems signatures. These system signatures are stored in coefficient files.
Before starting EARD, and just for testing, it is needed to create a dummy coefficient file and copy in the coefficients path, by default placed at$EAR_ETC/coeffs. Use the coeffs_null application from tools section.
EAR version 4.1 does not require null coefficients.
EAR services
Create soft links or copy EAR service files to start/stop services
using system commands such as systemctl in the services folder. EAR service files
are generated at $EAR_ETC/systemd and they can usually be placed in $(ETC)/systemd.
- EARD must be started on compute nodes.
- EARDBD must be started on service nodes (can be any node with DB access).
Enable and start EARDs and EARDBDs via services (e.g., sudo systemctl start eard, sudo systemctl start eardbd).
EARDBD and EARD outputs can be found at $EAR_TMP/eardbd.server.log and $EAR_TMP/eard.log respectively when DBDaemonUseLog and NodeUseLog options are set to 1 in the ear.conf file, respectively.
Otherwise, their outputs are generated at stderr and can be seen using the journalctl command (i.e., journalctl -u eard).
By default, a certain level of verbosity is set. It is not recommended to modify
it but you can change it by modifying the value of constants in file src/common/output/output_conf.h.
Quick validation
Check that EARDs are up and running correctly with econtrol --status
(note that daemons will take around a minute to correctly report energy and not show up as an error in econtrol).
EARDs create a per-node text file with values reported to the EARDBD (local to compute nodes).
In case there are problems when running econtrol, you can also find this file at
$EAR_TMP/nodename.pm_periodic_data.txt.
Check that EARDs are reporting metrics to database with ereport. ereport -n all
should report the total energy sent by each daemon since the setup.
- Set up EAR's SLURM plugin (see the configuration section for more information).
It is recommented to create a soft link to the
$EAR_ETC/slurm/ear.plugstack.conffile in the/etc/slurm/plugstack.conf.ddirectory to simplify the EAR plugin management.
For a first test it is recommened to set
default=offin theear.plugstack.confto disable the automatic loading of the EAR library.
- Set up EAR PBS Hook (see the configuration section for more information).
For a first test it is recommened to set
default=offin theear_hook_conf.inito disable the automatic loading of the EAR library.
At this point you must be able to see EAR options when doing, for example, srun --help.
You must see something like below as part of the output. The EAR plugin must be enabled at login and compute nodes.
[user@hostname ~]$ srun --help
Usage: srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]] executable(N) [args(N)...]
Parallel run options:
...
Constraint options:
...
Consumable resources related options:
...
Affinity/Multi-core options: (when the task/affinity plugin is enabled)
...
Options provided by plugins:
--ear=on|off Enables/disables Energy Aware Runtime Library
--ear-policy=type Selects an energy policy for EAR
{type=default,gpu_monitoring,monitoring,min_energ-
y,min_time,gpu_min_energy,gpu_min_time}
--ear-cpufreq=frequency Specifies the start frequency to be used by EAR
policy (in KHz)
--ear-policy-th=value Specifies the threshold to be used by EAR policy
(max 2 decimals) {value=[0..1]}
--ear-user-db=file Specifies the file to save the user applications
metrics summary 'file.nodename.csv' file will be
created per node. If not defined, these files
won't be generated.
--ear-verbose=value Specifies the level of the
verbosity{value=[0..1]}; default is 0
--ear-learning=value Enables the learning phase for a given P_STATE
{value=[1..n]}
--ear-tag=tag Sets an energy tag (max 32 chars)
...
Help options:
-h, --help show this help message
--usage display brief usage message
Other options:
-V, --version output version information and exit
In PBS, to see EAR options run ear-hook-help.
You must see something like below as part of the output. The EAR must be loaded.
For PBS:
[user@hostname ~]$ module load ear
[user@hostname ~]$ ear-hook-help
- Submit one application via the scheduler and check that it is correctly reported to the database with
eacctcommand.
Note that only privileged users can check other users’ applications.
- Submit one MPI application (corresponding with the version you have compiled) with
sbatch --ear=onorqsub -v "EAR=on"and check that now the output ofeacctincludes the Library metrics. - Set
default=onto set the EAR Library loading by default atear.plugstack.confor inhook_config.ini.
At this point, you can use EAR for monitoring and accounting purposes but it cannot use the power policies offered by EARL. To enable them, first perform a learning phase and compute node coefficients. See the EAR learning phase wiki page. For the coefficients to be active, restart daemons.
Important Reloading daemons will NOT make them load coefficients, restarting the service is the only way.
EAR includes the specification files to create an RPM from an already existing installation. Once created, it can be included in the compute nodes images. It is recommened only when no more changes are expected on the installation or when your compute fleet has ephimeral storage and EAR is installed in a non-shared file system.
The spec file is placed at etc/rpms/specs/ear.spec and it is generated from etc/rpms/specs/ear.spec.in at configuration time.
The RPM can be part of the system image.
Visit the Requirements page for a quick overview of the requirements.
Execute the rpmbuild.sh script to create the EAR RPM file.
This is script is located at etc/rpms and it is created from etc/rpms/rpmbuild.sh.in at configuration time.
Run it from its location.
The rpm file will be located at $HOME/rpmbuild/RPMS.
You can install it by typing:
rpm -ivh <ear_rpm_filename>.rpmYou can also use the
--nodepsif your dependency test fails. Typerpm -e <ear_rpm_filename>to uninstall.
The *.in configuration files are compiled into etc/ear/ear.conf.template
and etc/ear/ear.full.conf.template, etc/module/ear, etc/slurm/ear.plugstack.conf
and various etc/systemd/ear*.service. You can find more information in
the configuration page.
Below table describes the complet heriarchy of the EAR installation:
| Directory | Content / description |
|---|---|
/usr/lib |
Libraries and the scheduler plugin. |
/usr/lib/plugins |
EAR plugins. |
/usr/bin |
EAR commands. |
/usr/bin/tools |
EAR tools for coefficients computation. |
/usr/sbin |
Privileged components: EARD, EARDBD, EARGMD. |
/etc/ear |
Configuration files templates. |
/etc/ear/coeffs |
Folder to store coefficient files. |
/etc/module |
EAR module. |
/etc/slurm |
EAR SLURM plugin configuration file. |
/etc/systemd |
EAR service files. |
EAR uses some third party libraries. EAR RPM will not ask for them when installing but they must be available in LD_LIBRARY_PATH when running an application and you want to use EAR.
Depending on the RPM, different version must be required for these libraries:
| Library | Minimum version | References |
|---|---|---|
| MPI | - | - |
| MySQL* | 15.1 | MySQL or MariaDB |
| PostgreSQL* | 9.2 | PostgreSQL |
| Autoconf | 2.69 | Website |
| GSL | 1.4 | Website |
* Just one of them required.
These libraries are not required, but can be used to get additional functionality or metrics:
| Library | Minimum version | References |
|---|---|---|
| SLURM | 17.02.6 | Website |
| PBS** | 2021 | PBSPro or OpenPBS |
| CUDA/NVML | 7.5 | CUDA |
| CUPTI** | 7.5 | CUDA |
| Likwid | 5.2.1 | Likwid |
| FreeIPMI | 1.6.8 | FreeIPMI |
| OneAPI/L0** | 1.7.9 | OneAPI |
| LibRedFish** | 1.3.6 | LibRedFish |
** These will be available in next release.
Also, some drivers has to be present and loaded in the system when starting EAR:
| Driver | File | Kernel version | References |
|---|---|---|---|
| CPUFreq | kernel/drivers/cpufreq/acpi-cpufreq.ko | 3.10 | Information |
| Open IPMI | kernel/drivers/char/ipmi/*.ko | 3.10 | Information |
The best way to execute all EAR daemon components (EARD, EARDBD, EARGM) is by the unit services method.
NOTE EAR uses a MariaDB/MySQL server. The server must be started before EAR services are executed.
The way to launch the EAR daemons is via unit services. The generated unit services for the EAR Daemon, EAR Global Manager Daemon and EAR Database Daemon are generated and installed in $(EAR_ETC)/systemd. You have to copy those unit service files to your systemd operating system folder and then use the systemctl command to run the daemons.
Check the EARD, EARDBD, EARGMD pages to find the precise execution commands.
When using systemctl commands, you can check messages reported to stderr using journalctl. For instance:
journalctl -u eard -f. Note that if NodeUseLog is set to 1 in ear.conf, the messages will not be printed to stderr but to $EAR_TMP/eard.log instead. DBDaemonUseLog and GlobalmanagerUseLog options in ear.conf specifies the output for EARDBD and EARGM, respectivelly.
Additionally, services can be started, stopped or reloaded on parallel using parallel commands such as pdsh. As an example:
sudo pdsh -w nodelist systemctl start eard.
In some cases, it might be a good idea to create a new install instead of updating your current one, like trying new configurations or when a big update is released.
The steps to do so are:
- Install EAR in the new folder
- Replicate old etc (including
ear.confand coefficients) in the new one and updateear.confwith the new ETC path and whatever changes may be needed. - Update EAR services in
/etc/systemd/systemfolder (or equivalent, depending on your OS). Service files include ETC path and the absolute path for binaries. - Update
/etc/slurm/plugstag.confwith the new paths. - Create a new EAR module with the updated paths.
Once all that is done, one should have two complete EAR installs that can be switched by changing the binaries that are executed by the services and changing the path in plugstag.conf.
For a better overview of the installation process, return to the installation guide. To continue the installation, visit the configuration page to set up properly the EAR configuration file and the EAR SLURM plugin stack file.
- Home
- User guide
- Tutorials
- Commands
- Environment variables
- Admin Guide
- Installation from source
- Architecture/Services
- High Availability support
- Configuration
- Classification strategies
- Learning phase
- Plug-ins
- Powercap
- Report plug-ins
- Database
- Supported systems
- EAR Data Center Monitoring
- CHANGELOG
- FAQs
- Known issues