-
Notifications
You must be signed in to change notification settings - Fork 1
EAR requirements
This section lists both software and harware requirements for compiling, running and using EAR. There is also a list of system requirements to use all EAR components and features.
Intel CPU families
- Skylake.
- IceLake.
- Sapphire Rappids.
AMD CPUs
- EPYC Rome, Milan and Genoa families.
Other
- ARM and Cray architectures are not tested in production.
NVIDIA
- From Turing to Hopper.
Intel
- PVC.
EAR has been tested in CentOS (>=7), SUSE, Rocky and Red Hat Linux distributions.
In the case you plan to report data to a database through the EARDBD, service nodes (wherever EARDBDs run) must be reachable by compute nodes, so EARDs can connect with them and report telemetry data. Moreover, EARDBDs and log-in nodes must be able to connect with Database servers for storing and retrieving data, respectively.
In order to be able to use administration commands, log-in nodes must be able to reach compute nodes.
- 1 TCP port for EARD on each compute node.
- 3 TCP ports for each EARDBD on service nodes.
- 1 TCP port for each EARGMD.
You need at least a modern C compiler, Autoconf (>= 2.69) and GNU make. The rest of requirements are optional based on features you want to enable.
A MPI compiler and headers are nedded for supporting MPI applications. Intel MPI, OpenMPI, MVAPICH, Fujistsu and Cray MPICH are the versions currently supported.
If you want to retrieve NVIDIA GPU metrics as well as modify GPU clock frequency, you need CUDA. Check out the minimum required version based on your device. OneAPI (>= 1.7.9) for supporting the same features on Intel PVC GPUs.
SLURM must also be present if the SLURM SPANK plug-in wants to be used. Since current EAR version only supports automatic execution of applications with EAR Library using the SLURM plug-in, it must be running when EARL wants to be used (not needed for the most basic node monitoring service). EAR needs slurm.h and spank.h header files in this case.
EAR currently supports two relational databases for storing data. MySQL (>= 15.1) or PostgreSQL headers and libraries are nedded.
Your compute nodes should support one of these commands:
- ipmitool dcmi power reading.
- ipmi-oem intelnm get-node-manager-statistics mode=globalpower.
- Lenovo SD650 commands for energy readings.
- Energy readings for Intel Node Manager.
- freeipmi.
- libredfish.
You system should have a Linux CPUFreq driver supporting userspace governor.
- acpi_cpufreq (recommended).
- intel_cpufreq and intel_pstate already tested and supported.
IPMI drivers must be installed in compute nodes. MSR kernel module must be loaded in compute nodes (msr-safe supported). Performance counters must be enabled (perf must be installed).
The nvidia-ml library is the component used by EAR for reading NVIDIA GPU metrics. For devices prior to Hopper, NVIDIA DCGM's dcgmi command is also required if you want more metrics than the GPU power, frequency and utilization and the GPu memory frequency and utilization.
AMD HSMP module is needed for supporting a set of system management features.
Counters such as cycles, instructions, cache misses or FLOPS should be supported at user level. perfparanoid level should be set accordingly.
In order to compute coefficients during the learning phase, EAR comes with a set of tools which need GSL (>=1.4).
- Home
- User guide
- Tutorials
- Commands
- Environment variables
- Admin Guide
- Installation from source
- Architecture/Services
- High Availability support
- Configuration
- Classification strategies
- Learning phase
- Plug-ins
- Powercap
- Report plug-ins
- Database
- Supported systems
- EAR Data Center Monitoring
- CHANGELOG
- FAQs
- Known issues