Add external python predictor#1898
Add external python predictor#1898danielwolff1 wants to merge 14 commits into4C-multiphysics:mainfrom
Conversation
mayrmt
left a comment
There was a problem hiding this comment.
@danielwolff1 I left some comments and raised some questions, that I could not answer for myself right away. For me, it looks like this is going into the right direction.
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
| // Initialize interpreter once per process. | ||
| static pybind11::scoped_interpreter py_guard; |
There was a problem hiding this comment.
You mentioned that only rank 0 should interact with Python. So far, this is not checked here. Just an observation. I am not sure yet, if an actual check is required.
There was a problem hiding this comment.
Thanks for raising this point! Took me a while to investigate it and find a solution.
In the first draft of the implementation, communication was indeed restricted to rank 0, i.e., only rank 0 would call the python interpreter, however, pybind11 data types were still used on other processes. Yet, in order to allow this, python needed to be initialized on all processes (which in itself was fine, as it didn't result in weird/undefined behavior during runtime, but as you rightfully pointed out, the implementation was not strictly following the documentation, as indeed a python interpreter was initialized on each process, thus creating an overhead as ultimately only rank 0 meaningfully used it later on).
With commit 4ba3d3c I revised the implementation so that it now strictly follows the self-imposed "only rank 0 communicates with python"-rule. That is, initialization of the python interpreter now only happens on rank 0. All processes participate in communication, but only rank 0 interacts with python and only rank 0 actually touches pybind11 data types, meaning all conversion between 4C-internal and python-compatible data types happen on rank 0.
Overall, the implementation should be much cleaner now and some of the bigger functions (like build_mesh_snapshot()) have now been split up in two, namely gathering data from other processes in a first step and converting the data from C++ to python-compatible data types in a second step.
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
| STRUCTURAL DYNAMIC: | ||
| INT_STRATEGY: "Standard" | ||
| DYNAMICTYPE: "Statics" | ||
| RESTARTEVERY: 2 | ||
| TIMESTEP: 0.5 | ||
| NUMSTEP: 2 | ||
| MAXTIME: 1 | ||
| PREDICT: "PythonSurrogate" | ||
| PYTHON_PREDICTOR_FILE: "./structure_new_python_surrogate_predictor_constdis.py" | ||
| TOLDISP: 1e-12 | ||
| TOLRES: 1e-12 | ||
| LINEAR_SOLVER: 1 | ||
| NLNSOL: "fullnewton" | ||
| MAXITER: 20 |
There was a problem hiding this comment.
This performs a full Newton solve, so the final displacement values in the result tests will be independent of the outcome of the predictor. Hence, this test might not test the outcome of the predictor, just that the code path is not broken.
There was a problem hiding this comment.
Yeah I thought that's what the integration tests are for and I wanted to provide a unit test for testing the actual functionality of the predictor. However, this is not as straightforward as I initially thought: In my opinion it would make more sense to test solid::predict::PythonWrapper::Implementation because this class provides the actual functionality, rather than solid::predict::PythonWrapper itself, which is only a wrapper. However, the Implementation class is currently private within PythonWrapper, which makes it very inconvenient to test.
So I guess I will adapt this test like we discussed and modify the newton iteration such that I can actually "see" the influence of my predictor, unless you have a better idea? 🤔
| STRUCTURAL DYNAMIC: | ||
| INT_STRATEGY: "Standard" | ||
| DYNAMICTYPE: "Statics" | ||
| RESTARTEVERY: 2 | ||
| TIMESTEP: 0.5 | ||
| NUMSTEP: 2 | ||
| MAXTIME: 1 | ||
| PREDICT: "PythonSurrogate" | ||
| PYTHON_PREDICTOR_FILE: "./structure_new_python_surrogate_predictor_constdis_with_setup.py" | ||
| TOLDISP: 1e-12 | ||
| TOLRES: 1e-12 | ||
| LINEAR_SOLVER: 1 | ||
| NLNSOL: "fullnewton" | ||
| MAXITER: 20 |
There was a problem hiding this comment.
This performs a full Newton solve, so the final displacement values in the result tests will be independent of the outcome of the predictor. Hence, this test might not test the outcome of the predictor, just that the code path is not broken.
Co-authored-by: Matthias Mayr <mayrmt@users.noreply.github.com> Signed-off-by: Daniel Wolff <39346676+danielwolff1@users.noreply.github.com>
danielwolff1
left a comment
There was a problem hiding this comment.
Thanks for the fast review, I appreciate the input! I still have to look into some of the points you mentioned and will do so next week probably.
| if (is_rank_zero(global_state_ptr->get_dis_n()->get_comm())) | ||
| { | ||
| // make it safe to touch python objects for the rest of this function | ||
| pybind11::gil_scoped_acquire gil; | ||
| // invoke the python predictor step in the python script | ||
| pybind11::object result = compute_(state); | ||
| // update 4C's internal state based on the result from the python script | ||
| apply_prediction(result, gathered, global_state_ptr); | ||
| } | ||
| else | ||
| { | ||
| // all other processes enter apply_prediction and wait for python to be finished. | ||
| // Once python is done, the results will be communicated back to each processor. | ||
| apply_prediction(pybind11::none(), gathered, global_state_ptr); | ||
| } |
There was a problem hiding this comment.
This is a good point. I've thought about it the last couple of days and my explanation is:
I think it would also work if I rewrote it like this:
// create an empty result object (necessary for processes with non-zero ID)
pybind11::object result = pybind11::none();
// Only process with rank 0 communicates with python
if (is_rank_zero(global_state_ptr->get_dis_n()->get_comm()))
{
// make it safe to touch python objects for the rest of this function
pybind11::gil_scoped_acquire gil;
// invoke the python predictor step in the python script
result = compute_(state);
// update 4C's internal state based on the result from the python script
apply_prediction(result, gathered, global_state_ptr);
}
// explicitly synchronise all processes, i.e., wait for process 0 to be done with python communication
Core::Communication::barrier(global_state_ptr->get_discret()->get_comm());
// Once processor 0 is done with python, the results will be communicated back to each processor.
apply_prediction(result, gathered, global_state_ptr);Right now, the communication happening in apply_prediction, i.e., the scattering of the results that process 0 received from python involves an implicit barrier, i.e., the current implementation let's all processes already entering the receiving state, but of course they will only receive the data, once processor 0 starts scattering them, i.e., after python communication is finished. So the communication in apply_prediction involves an implicit barrier.
Does that make sense or do you disagree?
If you prefer, I can adapt the implementation to make the synchronisation more explicit.
| const int node_id = node->id(); | ||
|
|
||
| // copy the node's coordinates into a simple STL container | ||
| std::vector<double> coords(3, 0.0); |
There was a problem hiding this comment.
True, but then I would also need to adapt the type of the container local_node_coords which is currently declared as std::map<int, std::vector<double>> because Core::LinAlg::gather() is already overloaded for a container of type std::map<int, std::vector<double>> (see here) which simplifies the communication. But I can also check if gather would work for a container of type std::map<int, std::array<double, 3>>, at least there seems to be a more general templated version available here.
src/structure_new/src/predict/4C_structure_new_predict_python_surrogate.cpp
Outdated
Show resolved
Hide resolved
| STRUCTURAL DYNAMIC: | ||
| INT_STRATEGY: "Standard" | ||
| DYNAMICTYPE: "Statics" | ||
| RESTARTEVERY: 2 | ||
| TIMESTEP: 0.5 | ||
| NUMSTEP: 2 | ||
| MAXTIME: 1 | ||
| PREDICT: "PythonSurrogate" | ||
| PYTHON_PREDICTOR_FILE: "./structure_new_python_surrogate_predictor_constdis.py" | ||
| TOLDISP: 1e-12 | ||
| TOLRES: 1e-12 | ||
| LINEAR_SOLVER: 1 | ||
| NLNSOL: "fullnewton" | ||
| MAXITER: 20 |
There was a problem hiding this comment.
Yeah I thought that's what the integration tests are for and I wanted to provide a unit test for testing the actual functionality of the predictor. However, this is not as straightforward as I initially thought: In my opinion it would make more sense to test solid::predict::PythonWrapper::Implementation because this class provides the actual functionality, rather than solid::predict::PythonWrapper itself, which is only a wrapper. However, the Implementation class is currently private within PythonWrapper, which makes it very inconvenient to test.
So I guess I will adapt this test like we discussed and modify the newton iteration such that I can actually "see" the influence of my predictor, unless you have a better idea? 🤔
| pybind11::array_t<double> Solid::Predict::PythonWrapper::Implementation::copy_vector_to_numpy( | ||
| const Core::LinAlg::Vector<double>& source) const | ||
| { | ||
| // create a new NumPy array | ||
| pybind11::array_t<double> target(source.local_length()); | ||
| auto target_view = target.mutable_unchecked<1>(); | ||
| const double* source_view = source.get_values(); | ||
| // copy data from vector into the NumPy array | ||
| for (int i = 0; i < source.local_length(); ++i) target_view(i) = source_view[i]; | ||
|
|
||
| return target; | ||
| } |
There was a problem hiding this comment.
Can we avoid to copy the data? If I'm not mistaken, we could just return a view to the data
| pybind11::array_t<double> Solid::Predict::PythonWrapper::Implementation::copy_vector_to_numpy( | |
| const Core::LinAlg::Vector<double>& source) const | |
| { | |
| // create a new NumPy array | |
| pybind11::array_t<double> target(source.local_length()); | |
| auto target_view = target.mutable_unchecked<1>(); | |
| const double* source_view = source.get_values(); | |
| // copy data from vector into the NumPy array | |
| for (int i = 0; i < source.local_length(); ++i) target_view(i) = source_view[i]; | |
| return target; | |
| } | |
| pybind11::array_t<const double> Solid::Predict::PythonWrapper::Implementation::copy_vector_to_numpy( | |
| const Core::LinAlg::Vector<double>& source) const | |
| { | |
| return py::array_t<const double>( | |
| {source.local_length()}, | |
| {sizeof(double)}, | |
| source.get_values() | |
| ) | |
| } |
This might even be general enough so that we can move this into a helper function close to our Core::LinAlg::Vector.
| void Solid::Predict::PythonWrapper::Implementation::copy_numpy_to_vector( | ||
| const pybind11::array_t<double>& source, Core::LinAlg::Vector<double>& target) const |
There was a problem hiding this comment.
You could also do it the other way around, i.e., directly writing into the data of our Core::LinAlg::Vector (same as above, just non-const)
Description and Context
With this PR, I'm proposing a new predictor class which allows to defer the actually implementation of the predictor to python. We need this feature as we want to study the behavior of Neural Network-based structural predictors.
Implementation details
I implemented this feature similarly to what has been done in the context of a python surrogate model for rough surface contact laws as merged in #1735. The general idea of the implementation are:
4Cexecution might happen in parallel, yet the computations in python only happen in a single interpreter, i.e., in a single process.pybind11functionality is completely encapsulated in anImplementationclass so that it does not interfere with the actual predictor.setup()method. That way the user can easily control whether to have this once-per-simulation overhead of communicating the mesh or whether they only need the "regular" information that also the C++ predictors see.Testing
I have implemented two integration tests where I test the two major different control flows, i.e., once the python script has a setup method and the mesh needs to be communicated and once not. In both cases, the python script implements the simple logic of a
ConstDispredictor and the corresponding values in theRESULTS DESCRIPTIONsection have been generated by running the same input scripts however withPREDICT = ConstDis.I'm still planning to look into unit tests for this feature but the general implementation already works and won't change anymore, so I already wanted to put it out for review.
I appreciate any comments and insights for potential improvements 😊