Classification strategies

As laid out in the Classification section of the Library, EAR incorporates three different strategies. This page will be dedicated to explaining how do they work internally, how and when do they become available to the end user.

Default model

EAR's default classification model is based on the application of a fixed set of thresholds (in absolute units) to CPI and MEM_GBS metrics. These thresholds, available since EAR's installation, would allow identifying the different execution phases on a fundamental level.

Input

This strategy takes 4 thresholds, 2 for characterizing CPU-bound applications, and 2 for MEMORY-bound ones. In turn, these are defined according to the architecture's specifications via domain knowledge of administrators. For instance, the values proposed for Sapphire Rapids type of node are

CPU-bound CPI: 0.4
CPU-bound MEM_GBS: 180
MEMORY-bound CPI: 0.4
MEMORY-bound MEM_GBS: 250

Classification philosophy

With these thresholds defined, the classification proceeds as follows:

Let S be the last registered signature
Let <CPU_CPI, CPU_BWD> be the tuple of CPU-bound thresholds
Let <MEM_CPI, MEM_BWD> be the tuple of MEMORY-bound thresholds
IF (S->CPI <= CPU_CPI && S->MEM_GBS <= CPU_BWD)
	Mark app as CPU-bound
ELSE IF (S->CPI >= MEM_CPI && S->MEM_GBS >= MEM_BWD)
	Mark app as MEMORY-bound
ELSE
	Mark app as MIX

Let us go over the idiosincracy of the current strategy:

To begin with, the model checks if the the CPI and MEM_GBS of the application are below the CPU-bound thresholds. The sign of this comparison is given by the fact that we expect a CPU-bound application to be executing lots of instructions (thus having a small CPI) and not bounded by memory (thus registering a low memory bandwidth).
If the app is not CPU-bound, the strategy checks whether the considered metrics are above the MEMORY-bound thresholds. The sign of this comparison is given by the fact that we expect a MEMORY-bound app to be using a considerable amount of memory bandwidth (thus having a big MEM_GBS) and not executing too many instructions (thus having a big CPI aswell).
If none of these conditions are met, the strategy labels the app as MIX.

The strength of this approach is its quickness and effectiveness, given that the classification is based upon the performance to be typically expected from the execution phases considered. However, this strategy is limited by its rigidness: the fact that (1) it works with a small amount of thresholds, and (2) in absolute values, leads to a mostly rigid (although not incorrect) classification of the applications' activity.

Roofline model

The roofline model combines floating point performance and memory traffic to characterize the activity of an application based on the performance limitations of the hardware [2]. For EAR, it allows for identifying execution phase types in a simple and quick way in runtime via bottleneck analysis.

Input

To use this strategy in EAR, we need the floating point performance and memory traffic peaks, which can be computed either theoretically or empirically. The EAR team offers its own tool, called ear-roofline, which allows for computing the theoretical peaks automatically, according to the architecture's specifications, and to do so, it makes use of the following equations:

To give an idea of how to apply these equations, let us give an example over EPYC 9654 nodes:

Once the peaks have been properly computed, these are stored in a file of the form roofline.{tag}.data, where tag corresponds to the tag of the targeted architecture as defined in the ear.conf file, which will depend on the environment the user is working on (check Tags for more information). The format of the stored peaks is

{peak memory bandwidth} {peak floating performance}

Following the EPYC 9654 example, we would create a file called roofline.epyc9654.data containing

921.6 22732.8

Moreover, this file is expected to be stored in the $EAR_ETC/ear/coeffs/island{X} directory; check the EARL configuration and the Island description sections for more details on this. There exists a way to circumvent this requirement, however: as indicated in the Paths modification section, there's an environment variable called HACK_ROOFLINE_COEFF_PATH which allows hacking the path to the coefficients file. Note that this variable only hacks the path, and not the expected filename, whose format still has to be complied with.

Classification strategy

Once EAR has access to the roofline peaks, the classification proceeds as follows

Let S be the last registered signature
Let <PEAK_FLOPS, PEAK_GBS> be the tuple of peaks of the architecture
IF (S->GFLOPS / S->MEM_GBS >= PEAK_FLOPS / PEAK_GBS)
	Mark app as CPU-bound
ELSE IF (S->MEM_GBS >= PEAK_GBS * 0.75)
	Mark app as MEMORY-bound
ELSE
	Mark app as MIX

Let us go over it step by step:

The strategy begins by checking if the operational intensity (i.e., the ratio between flops and bytes) is bigger than the threshold defined by the peaks of the architecture. If so, the app is marked as CPU-bound.
If the app is not CPU-bound, the strategy checks if the app has a memory bandwidth close enough (in absolute units) to the peak memory bandwidth of the architecture via a threshold (which, by default, is experimentally set to 0.75). If the application meets this condition, it is labelled as MEMORY-bound.
If None of both conditions are met, then it is labelled as MIX.

This way, the original roofline model is adapted to include the MIX execution phase while maintaining its classification philosophy, thus allowing to identify execution phases with a quickness and effectiveness similar to the initial classification strategy. Although this strategy is more robust (given that it is based on bottleneck analysis), it is virtually just as rigid as the previous one.

K-medoids

K-medoids is a clustering method based in the k-means [6] algorithm, but instead of using centroids as representatives of the classes, it uses medoids (i.e., elements of the dataset) [7]. This strategy, which becomes available once EAR has enough user data in its database to train it, is flexible enough to be regenerated over time, thus accounting for changes in the type of applications executed by users as well as ensuring its robustness.

Input

To achieve a balance between classification quality and runtime performance, this strategy works with a subset of signature metrics, conformed by CPI, TPI, GFLOPS and MEM_GBS. This way, it combines metrics both from the architecture and the application's computational activity.

To use this strategy, two files are needed: one for the standardization process, and another one for the medoids. Regarding the first one, it stores the mean and standard deviations in the following format:

{CPI std} {CPI mean} {TPI std} {TPI mean} {GFLOPS std} {GFLOPS mean} {MEM_GBS std} {MEM_GBS mean}

This first file must follow the format name extremes.{tag}.data, where tag corresponds to the tag of the node partition.

Now, regarding the second file, it stores the medoids in the following format:

{CPU-bound CPI} {CPU-bound TPI} {CPU-bound GFLOPS} {CPU-bound MEM_GBS} {MEMORY-bound CPI} {MEMORY-bound TPI} {MEMORY-bound GFLOPS} {MEMORY-bound MEM_GBS} {MIX CPI} {MIX TPI} {MIX GFLOPS} {MIX MEM_GBS}

This file must also follow a format name medoids.{tag}.data, where tag corresponds to the tag of the node partition.

NOTE: the tag will depend on the environment the user is working on. Check Tags for more information on this.

NOTE 2: as in the roofline case, it is expected that both medoids and extremes files are stored in the ${EAR_ETC}/ear/coeffs/island{X} directory. Check the EARL configuration and the Island description sections for more details on this. Additionally, like in the previous case, there exists an environment variable called HACK_MEDOIDS_COEFF_PATH which, as defined in the Paths modification section, allows to hack the path to the coefficients.

Finally, just as with the roofline strategy, the EAR team offers its own tool, called ear-kmedoids (still in development), which allows computing and generating automatically both files with the format required by EAR, and in a user-friendly flavour.

Classification strategy

Once EAR has access to the extremes and medoids files, the classification proceeds as follows:

Let S be the last registered signature
Let <CPU_MEDOID, MEM_MEDOID, MIX_MEDOID> be the tuple of medoids
Let V = [S->CPI, S->TPI, S->GFLOPS, S->MEM_GBS]
Standardize vector V 
Let D1 = euclidean_distance(V, CPU_MEDOID)
Let D2 = euclidean_distance(V, MEM_MEDOID)
Let D3 = euclidean_distance(V, MIX_MEDOID)
IF (D1 < D2 and D1 < D3)
	Mark app as CPU-bound
ELSE IF (D2 < D1 and D2 < D3)
	Mark app as MEMORY-bound
ELSE
	Mark app as MIX

Let us go over these steps:

The strategy begins by selecting the targeted subset of signature metrics and standardizing them. This step implicitly involves EAR preloading the extremes (i.e., mean and std of each metric) during the initialization of the Library, thus doing such loading just once per execution.
Once the metrics have been standardized, we compute the euclidean distance of this vector to the different medoids, which are also preloaded during the initialization of the Library (just like extremes). We name these distances D1, D2 and D3.
With these distances computed, we identify the current execution phase of the app by checking which medoid is closer to the signature.

With this strategy, EAR's classification becomes more flexible and adaptative while maintaining its robustness, which helps attain the maximum performance expected out of this Library's module.

As a final note, the Library selects the classification strategy by prioritizing the K-medoids over the Roofline, and this latter one over the Default strategy. It can be thought of in the following way:

IF K-medoids strategy is available:
	Load K-medoids strategy
ELSE IF Roofline is available:
	Load roofline strategy
ELSE
	Load default strategy

Thus, the user must be aware of the files available and accessible to EAR, which will determine how EAR approaches signature classification.

References

[1] McCalpin, John. “STREAM: Sustainable memory bandwidth in high performance computers.” http://www.cs.virginia.edu/stream/ (2006).

[2] Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65-76.

[3] Andreolli, C., Thierry, P., Borges, L., Skinner, G., Yount, C., Jeffers, J., & Reinders, J. (2015). Characterization and optimization methodology applied to stencil computations. High Performance Parallelism Pearls, 377-396.

[4] Bakos, J. D. (2016). Multicore and data-level optimization. In Embedded Systems (pp. 49–103). Elsevier. https://doi.org/10.1016/b978-0-12-800342-8.00002-x

[5] Petitet, Antoine. “HPL-a portable implementation of the high-performance Linpack benchmark for distributed-memory computers.” http://www.netlib.org/benchmark/hpl/ (2004).

[6] Blömer, J., Lammersen, C., Schmidt, M., & Sohler, C. (2016). Theoretical analysis of the k-means algorithm–a survey. Algorithm Engineering: Selected Results and Surveys, 81-116.

[7] Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), 3336-3341.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Classification strategies

Default model

Input

Classification philosophy

Roofline model

Input

Classification strategy

K-medoids

Input

Classification strategy

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally