The following repository contains the setup and obtained results of FL experiments.
The experiments were carried out on two different infrastructures:
- Single Local Virtual Machine (Centralized setup):
- CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (2 cores, 4 processors),
- RAM: 16 GB
- Deomplyment method: Services deployed as images within AMD64 Linux local Docker containers
- aerOS Continuum (Decentralized setup):
- CPU: x64 Architecture, 4 cores,
- RAM: 15.6 GB
- Deployment method: Containers deployed on Linux-based aerOS continuum
The training dataset was generated semi-synthetically using the Extended Green Cloud Simulator (EGCS), a simulation model of CloudFerro green edge infrastructure. It represents resource utilization during computational task execution throughout the entire year and combines:
- Real weather condition data monitored by Electrum throughout the year 2024
- Simulated using EGCS resource utilization
- Simulated resource demands of computational tasks generated based on real tasks processed in CloudFerro infrastructure
The dataset was composed of 10156 observations in total and was divided into 4 different subsets, each corresponding to an individual season of the year:
- (Winter) 2539 observations
- (Spring) 2750 observations
- (Summer) 2727 observations
- (Autumn) 2143 observations
All of the subsets were subsequently randomly divided in half, as such representing the client’s local data used for training.
The training objective was to learn an effective task migration strategy — specifically, to identify when factors such as CPU usage and weather conditions should trigger task migration. The ultimate aim was to maximize the utilization of servers powered by green energy.
Training was performed simultaneously on two clients (FL Local Operations), with the Federated Averaging (FedAvg) algorithm used to aggregate model updates across the clients. The trained model was a simple neural network with a single hidden layer. The experiments were run for each data subset separately, resulting in running 4 training experiments on each infrastructure.
The repository is organized into the following directories and files:
Contains energy efficiency diagnostics and power consumption data collected during the FL training experiments:
energy-report-training-Q1.htmltoenergy-report-training-Q4.html: Power efficiency diagnostic reports generated using Windowspowercfg /energycommand for each quarterly training sessionresults_aeros.md: Summary of energy utilization measurements across aerOS Continuum nodes during training, showing average energy consumption per quarter and node on which local operation client was deployedtransfer/data_transfer.csv: Time-series data capturing data transfer metrics for sending data between VMs (used to justify missing energy consumption saving)
Utility scripts:
estimate_power_from_report.py: Python script that parses HTML energy reports and estimates power consumption based on CPU utilization, device states, and configurable power coefficientsupload_model.py: Script for uploading trained FL models to the FL Repository component via REST API
Configuration files and resources required for FL experiment setup:
local-operations/: Client-side configuration filessetup.json: Data loader and client library configurationmodel.json,format.json: Model architecture and data format specificationstransformation_pipeline_train.json,transformation_pipeline_test.json: Data preprocessing pipeline definitions
orchestrator/: Server-side configurationtraining_config.json: FL training parameters
application.tests.migration_loader.pkland.zip: Serialized data loader implementations necesary to run the training
migration_predictor_v1.zip: Packaged FL model used in training
Quantitative results from FL training experiments on both infrastructures:
Results from aerOS Continuum (decentralized) infrastructure:
training_results_Q1.jsontotraining_results_Q4.json: Training metrics including final loss, accuracy, number of federated rounds, and model metadata for each dataset
Results from local VM (centralized) infrastructure:
training_results_Q1.jsontotraining_results_Q4.json: Corresponding training metrics from the local Docker-based setup for performance comparison