Add per element/rank timing#1
Add per element/rank timing#1LaSi5002 wants to merge 3 commits intodragos-ana:add-predictor-adaptation-transvisotrop-material-squashedfrom
Conversation
|
fyi: @c-p-schmidt , @dragos-ana |
There was a problem hiding this comment.
Pull request overview
This PR adds optional runtime timing outputs to help profile FE element evaluation costs, exposing per-element evaluation wall time in VTK output and per-rank (MPI) evaluation time in a CSV file (currently wired for structure simulations).
Changes:
- Add per-element evaluation wall-time accumulation on
Core::Elements::Elementand expose it through the VTK runtime structure writer. - Add per-rank evaluation-time CSV output (gathered across ranks) behind a new
IO/PER_RANK_EVAL_TIMEparameter. - Add new IO parameters and wiring to enable/disable timing at discretization creation time.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/input_files/mat_transviso_viscoplast_refJC_log_timint.4C.yaml | Minor YAML formatting tweak. |
| src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.hpp | Add getter/flag for element-eval-time output. |
| src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.cpp | Read ELEMENT_EVAL_TIME from input params. |
| src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.hpp | Add rank-eval-time CSV writer member + output method declaration. |
| src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.cpp | Create CSV writer, append element eval times to VTK, write rank CSV, reset timers. |
| src/mat/vplast/4C_mat_vplast_anand.cpp | Remove stray blank lines. |
| src/inpar/4C_inpar_io.cpp | Add IO/PER_RANK_EVAL_TIME parameter. |
| src/inpar/4C_inpar_IO_runtime_vtk_output_structure.cpp | Add RUNTIME VTK OUTPUT/STRUCTURE/ELEMENT_EVAL_TIME parameter. |
| src/global_data/4C_global_data_read.cpp | Enable discretization timing based on ELEMENT_EVAL_TIME and guard by problem type. |
| src/core/io/src/4C_io_discretization_visualization_writer_mesh.hpp | Add append_element_eval_time() API. |
| src/core/io/src/4C_io_discretization_visualization_writer_mesh.cpp | Implement writing per-element eval time as VTK cell data. |
| src/core/fem/src/general/element/4C_fem_general_element.hpp | Add eval_time_ storage + accessors + evaluate_with_timing() declarations. |
| src/core/fem/src/general/element/4C_fem_general_element.cpp | Implement evaluate_with_timing() and initialize/copy eval_time_. |
| src/core/fem/src/discretization/4C_fem_discretization_utils_evaluate.cpp | Switch evaluation helper to evaluate_with_timing(). |
| src/core/fem/src/discretization/4C_fem_discretization_evaluate.cpp | Add timing around element_action in discretization evaluate loops. |
| src/core/fem/src/discretization/4C_fem_discretization.hpp | Add timing flag + reset + per-rank getter declarations. |
| src/core/fem/src/discretization/4C_fem_discretization.cpp | Implement reset and per-rank gather of eval times. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::vector<double> Core::FE::Discretization::get_rank_eval_times() const | ||
| { | ||
| double my_eval_time_sum = 0.0; | ||
| for (auto* ele : elerowptr_) | ||
| { | ||
| FOUR_C_ASSERT(ele != nullptr, | ||
| "Encountered null owned row element while getting evaluation timers in discretization {}!", | ||
| name_); | ||
| my_eval_time_sum += ele->eval_time(); | ||
| } | ||
|
|
||
| std::vector<double> eval_times(Core::Communication::num_mpi_ranks(get_comm())); | ||
| Core::Communication::gather_all(&my_eval_time_sum, eval_times.data(), 1, get_comm()); | ||
| return eval_times; | ||
| } |
There was a problem hiding this comment.
get_rank_eval_times() sums only elerowptr_ (owned row elements), but element evaluation in Discretization::evaluate() iterates over my_col_element_range() (owned + ghost). If ghost elements are evaluated on a rank, the reported per-rank time will be systematically too small. Either sum over all locally evaluated elements (column range) or measure per-rank wall time directly around the evaluation loop instead of via per-element sums.
| // time the function wrapper separately if timing is enabled, to get per element evaluation | ||
| // times to avoid double counting evalate_with_timing should not be passed as element_action | ||
| const auto start_time = time_eval_time_ ? std::chrono::steady_clock::now() | ||
| : std::chrono::steady_clock::time_point{}; | ||
|
|
||
| // call the element evaluate method | ||
| element_action(*actele, la, strategy.elematrix1(), strategy.elematrix2(), strategy.elevector1(), | ||
| strategy.elevector2(), strategy.elevector3()); | ||
|
|
||
| // per element timer stop and record if timing is enabled | ||
| if (time_eval_time_) | ||
| actele->add_eval_time( | ||
| std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count()); | ||
|
|
There was a problem hiding this comment.
Timing is currently applied to the entire element_action callback, but this overload is also used for non-Element::evaluate() work (e.g., lambdas that assemble custom matrices). With time_eval_time_ enabled, this will pollute Element::eval_time_ with unrelated costs and distort the “evaluate wall time” output. Consider restricting timing to the actual Element::evaluate* paths (e.g., only in the default evaluate(params, strategy) implementation) or providing a dedicated API for timing element evaluation.
| runtime_csvwriter_rank_eval_times_ = | ||
| new Core::IO::RuntimeCsvWriter(Core::Communication::my_mpi_rank(discret().get_comm()), | ||
| discret_ptr()->writer()->output(), "rank_eval_time"); | ||
| runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16); |
There was a problem hiding this comment.
RuntimeCsvWriter::register_data_vector() must be called with the number of components equal to the size of the vector you later write. Here it’s registered with numcomponents=1, but output_rank_eval_times() writes discret().get_rank_eval_times() which has length num_mpi_ranks. This will hit the writer’s size assertion on proc 0 for parallel runs. Please register with Core::Communication::num_mpi_ranks(discret().get_comm()) (or otherwise ensure the written vector size matches the registered component count).
| runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16); | |
| runtime_csvwriter_rank_eval_times_->register_data_vector( | |
| "RankEvalTime", Core::Communication::num_mpi_ranks(discret().get_comm()), 16); |
| double* dt_ele_ptr_; | ||
|
|
||
| //! csv writer for per rank per timestep evaluation times | ||
| Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_; |
There was a problem hiding this comment.
runtime_csvwriter_rank_eval_times_ is managed as a raw pointer with manual new/delete in setup(), but there’s no visible ownership cleanup on object destruction and it’s easy to leak or double-delete in future changes. Prefer std::unique_ptr<Core::IO::RuntimeCsvWriter> (or std::optional<RuntimeCsvWriter>, as used elsewhere in the repo) to make ownership explicit and exception-safe.
| Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_; | |
| std::unique_ptr<Core::IO::RuntimeCsvWriter> runtime_csvwriter_rank_eval_times_; |
| eval_data().set_action_type(Core::Elements::struct_calc_predict); | ||
| eval_data().set_predictor_type(pred_type); | ||
|
|
||
| discret().reset_element_eval_timers(); |
There was a problem hiding this comment.
reset_element_eval_timers() is called unconditionally here, which adds an O(#owned-elements) loop every timestep even when element timing/output is disabled. Consider guarding this with if (discret().time_eval_time()) (or the relevant IO flag) to avoid unnecessary overhead in normal runs.
| discret().reset_element_eval_timers(); | |
| if (discret().time_eval_time()) | |
| discret().reset_element_eval_timers(); |
| void Core::FE::Discretization::reset_element_eval_timers() | ||
| { | ||
| for (auto* ele : elerowptr_) | ||
| { | ||
| FOUR_C_ASSERT(ele != nullptr, | ||
| "Encountered null owned row element while resetting evaluation timers in discretization " | ||
| "{}!", | ||
| name_); | ||
| ele->set_eval_time(0.0); | ||
| } | ||
| } |
There was a problem hiding this comment.
These timers are reset only for elerowptr_ (owned row elements), but evaluation timing is accumulated on all local column elements (my_col_element_range()), including ghosts. This leaves ghost element timers stale and also makes later rank totals easy to undercount. Consider iterating over all_local_element_ids_/my_col_element_range() for reset, or ensure timing is only ever recorded on owned row elements.
| int Core::Elements::Element::evaluate_with_timing(Teuchos::ParameterList& params, | ||
| Core::FE::Discretization& discretization, LocationArray& la, | ||
| Core::LinAlg::SerialDenseMatrix& elemat1, Core::LinAlg::SerialDenseMatrix& elemat2, | ||
| Core::LinAlg::SerialDenseVector& elevec1, Core::LinAlg::SerialDenseVector& elevec2, | ||
| Core::LinAlg::SerialDenseVector& elevec3) | ||
| { | ||
| const auto start_time = discretization.time_eval_time() ? std::chrono::steady_clock::now() | ||
| : std::chrono::steady_clock::time_point{}; | ||
| const int err = evaluate(params, discretization, la, elemat1, elemat2, elevec1, elevec2, elevec3); | ||
| if (discretization.time_eval_time()) | ||
| eval_time_ += | ||
| std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count(); | ||
| return err; |
There was a problem hiding this comment.
evaluate_with_timing() uses std::chrono but this translation unit doesn’t include <chrono>, which will cause a compile error unless it happens to be pulled in indirectly. Please add an explicit #include <chrono> in this .cpp (or the header if you prefer).
| bool havedof_; | ||
|
|
||
| //! Flag: time each element evaluation and store the result in the element | ||
| bool time_eval_time_; |
There was a problem hiding this comment.
time_eval_time_ is a new data member but it’s not default-initialized here. If a Discretization is used before set_time_eval_time() is called, this can lead to undefined behavior (randomly timing or not timing). Please initialize it to false (e.g., in-class initializer bool time_eval_time_{false}; or in the constructor initializer list).
| bool time_eval_time_; | |
| bool time_eval_time_{false}; |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: LaSi5002 <84587043+LaSi5002@users.noreply.github.com>
Per element and rank evaluation time can be added with:
Their locations (IO or IO/RUN...) will be adjusted depending on what problem types they will work in the end.
Screencast.from.2026-03-29.21-33-07.webm
Per rank evaluation times are written to a csv file if enabled.
There are still things that are unclear to me. Mainly the following:
An important thing to consider is, that with the current setup, only structure simulations can export per element and per rank evaluation times.
The reason is that I did not find (or there does not exist) a main loop or function that gets excecuted on every timestep regardless of the problem type.
Which is unintuitive to me, because it seems to me like there are some things that should get excecuted on every timestep for every problem, like writing element owners or material ids.
Note on Element::eval_time_:
its not an optional to decrease overhead (which is negligable anyways)
Note on bool Discretization::time_eval_time_:
the name means whether or not to time the evaluation time. If you think that could be misunderstood I can change it.