Add per element/rank timing by LaSi5002 · Pull Request #1 · dragos-ana/4C

LaSi5002 · 2026-03-29T19:36:14Z

Per element and rank evaluation time can be added with:

IO:
  PER_RANK_EVAL_TIME: false
IO/RUNTIME VTK OUTPUT/STRUCTURE:
  ELEMENT_EVAL_TIME: true

Their locations (IO or IO/RUN...) will be adjusted depending on what problem types they will work in the end.

Screencast.from.2026-03-29.21-33-07.webm

(Per Element evaluation times for ReformulatedJohnsonCook for stress test probe)

Per rank evaluation times are written to a csv file if enabled.

There are still things that are unclear to me. Mainly the following:

An important thing to consider is, that with the current setup, only structure simulations can export per element and per rank evaluation times.
The reason is that I did not find (or there does not exist) a main loop or function that gets excecuted on every timestep regardless of the problem type.
Which is unintuitive to me, because it seems to me like there are some things that should get excecuted on every timestep for every problem, like writing element owners or material ids.

Note on Element::eval_time_:
its not an optional to decrease overhead (which is negligable anyways)

Note on bool Discretization::time_eval_time_:
the name means whether or not to time the evaluation time. If you think that could be misunderstood I can change it.

LaSi5002 · 2026-03-30T11:30:37Z

fyi: @c-p-schmidt , @dragos-ana

Copilot

Pull request overview

This PR adds optional runtime timing outputs to help profile FE element evaluation costs, exposing per-element evaluation wall time in VTK output and per-rank (MPI) evaluation time in a CSV file (currently wired for structure simulations).

Changes:

Add per-element evaluation wall-time accumulation on Core::Elements::Element and expose it through the VTK runtime structure writer.
Add per-rank evaluation-time CSV output (gathered across ranks) behind a new IO/PER_RANK_EVAL_TIME parameter.
Add new IO parameters and wiring to enable/disable timing at discretization creation time.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/input_files/mat_transviso_viscoplast_refJC_log_timint.4C.yaml	Minor YAML formatting tweak.
src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.hpp	Add getter/flag for element-eval-time output.
src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.cpp	Read `ELEMENT_EVAL_TIME` from input params.
src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.hpp	Add rank-eval-time CSV writer member + output method declaration.
src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.cpp	Create CSV writer, append element eval times to VTK, write rank CSV, reset timers.
src/mat/vplast/4C_mat_vplast_anand.cpp	Remove stray blank lines.
src/inpar/4C_inpar_io.cpp	Add `IO/PER_RANK_EVAL_TIME` parameter.
src/inpar/4C_inpar_IO_runtime_vtk_output_structure.cpp	Add `RUNTIME VTK OUTPUT/STRUCTURE/ELEMENT_EVAL_TIME` parameter.
src/global_data/4C_global_data_read.cpp	Enable discretization timing based on `ELEMENT_EVAL_TIME` and guard by problem type.
src/core/io/src/4C_io_discretization_visualization_writer_mesh.hpp	Add `append_element_eval_time()` API.
src/core/io/src/4C_io_discretization_visualization_writer_mesh.cpp	Implement writing per-element eval time as VTK cell data.
src/core/fem/src/general/element/4C_fem_general_element.hpp	Add `eval_time_` storage + accessors + `evaluate_with_timing()` declarations.
src/core/fem/src/general/element/4C_fem_general_element.cpp	Implement `evaluate_with_timing()` and initialize/copy `eval_time_`.
src/core/fem/src/discretization/4C_fem_discretization_utils_evaluate.cpp	Switch evaluation helper to `evaluate_with_timing()`.
src/core/fem/src/discretization/4C_fem_discretization_evaluate.cpp	Add timing around `element_action` in discretization evaluate loops.
src/core/fem/src/discretization/4C_fem_discretization.hpp	Add timing flag + reset + per-rank getter declarations.
src/core/fem/src/discretization/4C_fem_discretization.cpp	Implement reset and per-rank gather of eval times.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T14:03:06Z

src/core/fem/src/discretization/4C_fem_discretization.cpp

+std::vector<double> Core::FE::Discretization::get_rank_eval_times() const
+{
+  double my_eval_time_sum = 0.0;
+  for (auto* ele : elerowptr_)
+  {
+    FOUR_C_ASSERT(ele != nullptr,
+        "Encountered null owned row element while getting evaluation timers in discretization {}!",
+        name_);
+    my_eval_time_sum += ele->eval_time();
+  }
+
+  std::vector<double> eval_times(Core::Communication::num_mpi_ranks(get_comm()));
+  Core::Communication::gather_all(&my_eval_time_sum, eval_times.data(), 1, get_comm());
+  return eval_times;
+}


get_rank_eval_times() sums only elerowptr_ (owned row elements), but element evaluation in Discretization::evaluate() iterates over my_col_element_range() (owned + ghost). If ghost elements are evaluated on a rank, the reported per-rank time will be systematically too small. Either sum over all locally evaluated elements (column range) or measure per-rank wall time directly around the evaluation loop instead of via per-element sums.

Copilot · 2026-03-30T14:03:06Z

src/core/fem/src/discretization/4C_fem_discretization_evaluate.cpp

+    // time the function wrapper separately if timing is enabled, to get per element evaluation
+    // times to avoid double counting evalate_with_timing should not be passed as element_action
+    const auto start_time = time_eval_time_ ? std::chrono::steady_clock::now()
+                                            : std::chrono::steady_clock::time_point{};
+
    // call the element evaluate method
    element_action(*actele, la, strategy.elematrix1(), strategy.elematrix2(), strategy.elevector1(),
        strategy.elevector2(), strategy.elevector3());

+    // per element timer stop and record if timing is enabled
+    if (time_eval_time_)
+      actele->add_eval_time(
+          std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count());
+


Timing is currently applied to the entire element_action callback, but this overload is also used for non-Element::evaluate() work (e.g., lambdas that assemble custom matrices). With time_eval_time_ enabled, this will pollute Element::eval_time_ with unrelated costs and distort the “evaluate wall time” output. Consider restricting timing to the actual Element::evaluate* paths (e.g., only in the default evaluate(params, strategy) implementation) or providing a dedicated API for timing element evaluation.

Copilot · 2026-03-30T14:03:06Z

src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.cpp

+    runtime_csvwriter_rank_eval_times_ =
+        new Core::IO::RuntimeCsvWriter(Core::Communication::my_mpi_rank(discret().get_comm()),
+            discret_ptr()->writer()->output(), "rank_eval_time");
+    runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16);


RuntimeCsvWriter::register_data_vector() must be called with the number of components equal to the size of the vector you later write. Here it’s registered with numcomponents=1, but output_rank_eval_times() writes discret().get_rank_eval_times() which has length num_mpi_ranks. This will hit the writer’s size assertion on proc 0 for parallel runs. Please register with Core::Communication::num_mpi_ranks(discret().get_comm()) (or otherwise ensure the written vector size matches the registered component count).

Suggested change

runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16);

runtime_csvwriter_rank_eval_times_->register_data_vector(

"RankEvalTime", Core::Communication::num_mpi_ranks(discret().get_comm()), 16);

Copilot · 2026-03-30T14:03:07Z

src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.hpp

      double* dt_ele_ptr_;

+      //! csv writer for per rank per timestep evaluation times
+      Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_;


runtime_csvwriter_rank_eval_times_ is managed as a raw pointer with manual new/delete in setup(), but there’s no visible ownership cleanup on object destruction and it’s easy to leak or double-delete in future changes. Prefer std::unique_ptr<Core::IO::RuntimeCsvWriter> (or std::optional<RuntimeCsvWriter>, as used elsewhere in the repo) to make ownership explicit and exception-safe.

Suggested change

Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_;

std::unique_ptr<Core::IO::RuntimeCsvWriter> runtime_csvwriter_rank_eval_times_;

Copilot · 2026-03-30T14:03:07Z

src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.cpp

  eval_data().set_action_type(Core::Elements::struct_calc_predict);
  eval_data().set_predictor_type(pred_type);

+  discret().reset_element_eval_timers();


reset_element_eval_timers() is called unconditionally here, which adds an O(#owned-elements) loop every timestep even when element timing/output is disabled. Consider guarding this with if (discret().time_eval_time()) (or the relevant IO flag) to avoid unnecessary overhead in normal runs.

Suggested change

discret().reset_element_eval_timers();

if (discret().time_eval_time())

discret().reset_element_eval_timers();

src/core/fem/src/general/element/4C_fem_general_element.hpp

Copilot · 2026-03-30T14:03:08Z

src/core/fem/src/discretization/4C_fem_discretization.cpp

+void Core::FE::Discretization::reset_element_eval_timers()
+{
+  for (auto* ele : elerowptr_)
+  {
+    FOUR_C_ASSERT(ele != nullptr,
+        "Encountered null owned row element while resetting evaluation timers in discretization "
+        "{}!",
+        name_);
+    ele->set_eval_time(0.0);
+  }
+}


These timers are reset only for elerowptr_ (owned row elements), but evaluation timing is accumulated on all local column elements (my_col_element_range()), including ghosts. This leaves ghost element timers stale and also makes later rank totals easy to undercount. Consider iterating over all_local_element_ids_/my_col_element_range() for reset, or ensure timing is only ever recorded on owned row elements.

Copilot · 2026-03-30T14:03:08Z

src/core/fem/src/general/element/4C_fem_general_element.cpp

+int Core::Elements::Element::evaluate_with_timing(Teuchos::ParameterList& params,
+    Core::FE::Discretization& discretization, LocationArray& la,
+    Core::LinAlg::SerialDenseMatrix& elemat1, Core::LinAlg::SerialDenseMatrix& elemat2,
+    Core::LinAlg::SerialDenseVector& elevec1, Core::LinAlg::SerialDenseVector& elevec2,
+    Core::LinAlg::SerialDenseVector& elevec3)
+{
+  const auto start_time = discretization.time_eval_time() ? std::chrono::steady_clock::now()
+                                                          : std::chrono::steady_clock::time_point{};
+  const int err = evaluate(params, discretization, la, elemat1, elemat2, elevec1, elevec2, elevec3);
+  if (discretization.time_eval_time())
+    eval_time_ +=
+        std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count();
+  return err;


evaluate_with_timing() uses std::chrono but this translation unit doesn’t include <chrono>, which will cause a compile error unless it happens to be pulled in indirectly. Please add an explicit #include <chrono> in this .cpp (or the header if you prefer).

Copilot · 2026-03-30T14:03:08Z

src/core/fem/src/discretization/4C_fem_discretization.hpp

    bool havedof_;

+    //! Flag: time each element evaluation and store the result in the element
+    bool time_eval_time_;


time_eval_time_ is a new data member but it’s not default-initialized here. If a Discretization is used before set_time_eval_time() is called, this can lead to undefined behavior (randomly timing or not timing). Please initialize it to false (e.g., in-class initializer bool time_eval_time_{false}; or in the constructor initializer list).

Suggested change

bool time_eval_time_;

bool time_eval_time_{false};

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: LaSi5002 <84587043+LaSi5002@users.noreply.github.com>

Lasse Simon (Schmidt) added 2 commits March 29, 2026 20:32

Add element specific timing with vtk output

4e74afd

Add per rank output to csv

46b4c3a

c-p-schmidt requested a review from Copilot March 30, 2026 13:56

Copilot started reviewing on behalf of c-p-schmidt March 30, 2026 13:57 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

4C_fem_general_element.hpp aktualisieren

7742943

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: LaSi5002 <84587043+LaSi5002@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per element/rank timing#1

Add per element/rank timing#1
LaSi5002 wants to merge 3 commits intodragos-ana:add-predictor-adaptation-transvisotrop-material-squashedfrom
LaSi5002:add-per-element-timing

LaSi5002 commented Mar 29, 2026

Uh oh!

LaSi5002 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16);
	runtime_csvwriter_rank_eval_times_->register_data_vector(
	"RankEvalTime", Core::Communication::num_mpi_ranks(discret().get_comm()), 16);

	Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_;
	std::unique_ptr<Core::IO::RuntimeCsvWriter> runtime_csvwriter_rank_eval_times_;

	discret().reset_element_eval_timers();
	if (discret().time_eval_time())
	discret().reset_element_eval_timers();

Conversation

LaSi5002 commented Mar 29, 2026

Uh oh!

LaSi5002 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants