Skip to content

VeloC plugin#661

Open
iole-bolognesi wants to merge 49 commits intopdidev:mainfrom
iole-bolognesi:cp-plugin
Open

VeloC plugin#661
iole-bolognesi wants to merge 49 commits intopdidev:mainfrom
iole-bolognesi:cp-plugin

Conversation

@iole-bolognesi
Copy link
Copy Markdown

@iole-bolognesi iole-bolognesi commented Apr 2, 2026

List of things to check before making a PR

Before merging your code, please check the following:

  • you have added a line describing your changes to the Changelog;
  • you have added unit tests for any new or improved feature;
  • in case you updated dependencies, you have checked pdi/docs/CheckList.md;
  • in case of a change in pdi.h, this same change must be reflected in no-pdi/include/pdi.h;
  • you have checked your code format:
    • you have checked that you respect all conventions specified in CONTRIBUTING.md;
    • you have checked that the indentation and formatting conforms to the .clang-format;
    • you have documented with doxygen any new or changed function / class;
  • you have correctly updated the copyright headers:
    • your institution is in the copyright header of every file you (substantially) modified;
    • you have checked that the end-year of the copyright there is the current one;
  • you have updated the AUTHORS file:
    • you have added yourself to the AUTHORS file;
    • if this is a new contribution, you have added it to the AUTHORS file;
  • you have added everything to the user documentation:
    • any new CMake configuration option;
    • any change in the yaml config;
    • any change to the public or plugin API;
    • any other new or changed user-facing feature;
    • any change to the dependencies;
  • you have correctly linked your MR to one or more issues:
    • your MR solves an identified issue;
    • your commit contain the Fix #issue keyword to autoclose the issue when merged.

…rrectly read and in case of non-failure checkpointing is done but entirely using the decl_hdf5 plugin..
…ence and check the definition of a recovery file by storing it as a class data member (and querying its value) rather than using the length of the node.
…nality: If the datasets' paths depend on simulation parameters, the user can define the last dataset paths. If not, the same dataset paths as defined by the "datasets" key are assumed
…per around the VeloC library. Implemented functionalities of writing checkpoints and restoring the latest checkpoint. Added the Cmake file to build PDI with new plugin. Next step: testing for correctness.
… than on data expose because otherwise multiple checkpoints files were being written for the same iteration. Added tests to: 1) check correct writing of checkpoints 2) check correct restoration after a failure. Tests pass with basic requirements. These need to be expanded.
…ons and fixed naming mismatch in veloc_wrapper.cxx
…ype divided by the number of elements in the case of an array datatype. Added additional check to ensure an error is thrown if a checkpoint event is called before all data to be included in checkpoints has been exposed to PDI.
…nd tests to assert that the expected number of checkpoint files has been written. Added cmake changes to compile tests with library but it still does not work.
…has been made to store the configuration of the veloc plugin. The plugin creates an object of the configuration class and calls its getters method to function correctly. Events of types RECOVER and SYNC_STATE have been added ; recovering at the moment is not automatic on expose. The pdi example ran successfully but only tetsing checkpointing so far.
…ts of logging to debug the recover_var bug which is still not working.
… variable names of the VeloC plugins nd changed minor logic: returning 1 rather than version for subsequent check of return value to be valid
…t . Removed the use of "assert" from existing tests.
@iole-bolognesi iole-bolognesi self-assigned this Apr 2, 2026
@jbigot jbigot force-pushed the main branch 3 times, most recently from c257c27 to ce93f25 Compare April 22, 2026 08:17
…an older version. Added VeloC plugin to the plugin related parts of the CMakeLists.txt and Source_installation.md
@iole-bolognesi iole-bolognesi marked this pull request as ready for review April 29, 2026 08:48
cmake --build tests_api_mockfind ${MAKEFLAGS}
ctest --output-on-failure --timeout 90 ${CTEST_FLAGS} ${CTEST_DIR:+--output-junit "${CTEST_DIR}/tests_api_mockfind.xml"} ${EXCLUDED_PDI_TESTS:+-E $EXCLUDED_PDI_TESTS} --test-dir tests_api_mockfind
fi
fi No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronisation issue with main, can be removed


#include <mpi.h>
#include <iostream>
#include <assert.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify the includes of all new tests, can remove this include to unused assert.h


void write_checkpoint(PDI::Context& ctx, std::string label, int version);

int read_checkpoint(PDI::Context& ctx, std::string label, int cp_id); // is this needed? can't remember anymore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Dependencies of **the VeloC plugin**:
* the PDI library,
* the [VeloC](https://veloc.readthedocs.io/en/latest/userguide.html) library version 1.8 or above (not provided)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* the [VeloC](https://veloc.readthedocs.io/en/latest/userguide.html) library version 1.8 or above (not provided)
* the [VeloC](https://veloc.readthedocs.io/en/latest/userguide.html) library version 1.8 or above (not provided),

static void error_handler(PDI_status_t status, const char* message, void* ctx)
{
if (status) {
std::cerr << "[PDI error] " << message << "\n";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May remove std::cerr and cout of all tests, to use fprintf of stderr instead (cout is not needed with ctest). See HDF5 tests for reference. May use EXPECT_EQ if moving to gtest

Comment on lines +25 to +44
#include <pdi/pdi_fwd.h>
#include <pdi/context.h>
#include <pdi/context_proxy.h>
#include <pdi/data_descriptor.h>
#include <pdi/datatype.h>
#include <pdi/error.h>
#include <pdi/expression.h>
#include <pdi/plugin.h>
#include <pdi/ref_any.h>


#include <iostream>
#include <optional>

#include "veloc_wrapper.h"

using PDI::Context;
using PDI::Ref_r;

using std::string;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All unused, except for #include "veloc_wrapper.h" and using std::string;
(also, if including std::string, no need to use std for the calls of this file)
Most std::string could be const std::string in this file (only read)

}
}

int read_checkpoint(PDI::Context& ctx, std::string label, int version)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incompatibility of read_checkpoint() and init_restart() for iteration index 0 from VELOC_Restart_test()

void protect_data(PDI::Context& ctx, int id, void* ptr, size_t n, size_t sub_bytes)
{
if (VELOC_Mem_protect(id, ptr, n, sub_bytes) != VELOC_SUCCESS) {
ctx.logger().error("Memory protect failed for id {} with ptr = {} and size = {}", id, ptr, (n * sub_bytes));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log error but doesn't stops here, could it corrupt memory further ? Should throw an error directly ? May use more exit(2); in this file

void unprotect_data(PDI::Context& ctx, int id)
{
if (VELOC_Mem_unprotect(id) != VELOC_SUCCESS) {
ctx.logger().error("Memory unprotect failed for id {}", id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as protect_data()


void end_checkpoint(PDI::Context& ctx)
{
if (VELOC_Checkpoint_end(1) != VELOC_SUCCESS) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to propagate the error if the user made a mistake between START_CHECKPOINT and END_CHECKPOINT, to signify success to END_CHECKPOINT so that a wrong simulation checkpoint does not happen

Suggested change
if (VELOC_Checkpoint_end(1) != VELOC_SUCCESS) {
if (VELOC_Checkpoint_end(success ? 1 : 0) != VELOC_SUCCESS) {

(needs a change in Event_type::END_CHECKPOINT)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants