Skip to content

Add per element/rank timing#1

Open
LaSi5002 wants to merge 3 commits intodragos-ana:add-predictor-adaptation-transvisotrop-material-squashedfrom
LaSi5002:add-per-element-timing
Open

Add per element/rank timing#1
LaSi5002 wants to merge 3 commits intodragos-ana:add-predictor-adaptation-transvisotrop-material-squashedfrom
LaSi5002:add-per-element-timing

Conversation

@LaSi5002
Copy link
Copy Markdown

Per element and rank evaluation time can be added with:

IO:
  PER_RANK_EVAL_TIME: false
IO/RUNTIME VTK OUTPUT/STRUCTURE:
  ELEMENT_EVAL_TIME: true

Their locations (IO or IO/RUN...) will be adjusted depending on what problem types they will work in the end.

Screencast.from.2026-03-29.21-33-07.webm
image (Per Element evaluation times for ReformulatedJohnsonCook for stress test probe)

Per rank evaluation times are written to a csv file if enabled.

There are still things that are unclear to me. Mainly the following:

An important thing to consider is, that with the current setup, only structure simulations can export per element and per rank evaluation times.
The reason is that I did not find (or there does not exist) a main loop or function that gets excecuted on every timestep regardless of the problem type.
Which is unintuitive to me, because it seems to me like there are some things that should get excecuted on every timestep for every problem, like writing element owners or material ids.

Note on Element::eval_time_:
its not an optional to decrease overhead (which is negligable anyways)

Note on bool Discretization::time_eval_time_:
the name means whether or not to time the evaluation time. If you think that could be misunderstood I can change it.

@LaSi5002
Copy link
Copy Markdown
Author

fyi: @c-p-schmidt , @dragos-ana

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds optional runtime timing outputs to help profile FE element evaluation costs, exposing per-element evaluation wall time in VTK output and per-rank (MPI) evaluation time in a CSV file (currently wired for structure simulations).

Changes:

  • Add per-element evaluation wall-time accumulation on Core::Elements::Element and expose it through the VTK runtime structure writer.
  • Add per-rank evaluation-time CSV output (gathered across ranks) behind a new IO/PER_RANK_EVAL_TIME parameter.
  • Add new IO parameters and wiring to enable/disable timing at discretization creation time.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/input_files/mat_transviso_viscoplast_refJC_log_timint.4C.yaml Minor YAML formatting tweak.
src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.hpp Add getter/flag for element-eval-time output.
src/structure_new/src/output/4C_structure_new_discretization_runtime_output_params.cpp Read ELEMENT_EVAL_TIME from input params.
src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.hpp Add rank-eval-time CSV writer member + output method declaration.
src/structure_new/src/model_evaluator/4C_structure_new_model_evaluator_structure.cpp Create CSV writer, append element eval times to VTK, write rank CSV, reset timers.
src/mat/vplast/4C_mat_vplast_anand.cpp Remove stray blank lines.
src/inpar/4C_inpar_io.cpp Add IO/PER_RANK_EVAL_TIME parameter.
src/inpar/4C_inpar_IO_runtime_vtk_output_structure.cpp Add RUNTIME VTK OUTPUT/STRUCTURE/ELEMENT_EVAL_TIME parameter.
src/global_data/4C_global_data_read.cpp Enable discretization timing based on ELEMENT_EVAL_TIME and guard by problem type.
src/core/io/src/4C_io_discretization_visualization_writer_mesh.hpp Add append_element_eval_time() API.
src/core/io/src/4C_io_discretization_visualization_writer_mesh.cpp Implement writing per-element eval time as VTK cell data.
src/core/fem/src/general/element/4C_fem_general_element.hpp Add eval_time_ storage + accessors + evaluate_with_timing() declarations.
src/core/fem/src/general/element/4C_fem_general_element.cpp Implement evaluate_with_timing() and initialize/copy eval_time_.
src/core/fem/src/discretization/4C_fem_discretization_utils_evaluate.cpp Switch evaluation helper to evaluate_with_timing().
src/core/fem/src/discretization/4C_fem_discretization_evaluate.cpp Add timing around element_action in discretization evaluate loops.
src/core/fem/src/discretization/4C_fem_discretization.hpp Add timing flag + reset + per-rank getter declarations.
src/core/fem/src/discretization/4C_fem_discretization.cpp Implement reset and per-rank gather of eval times.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +683 to +697
std::vector<double> Core::FE::Discretization::get_rank_eval_times() const
{
double my_eval_time_sum = 0.0;
for (auto* ele : elerowptr_)
{
FOUR_C_ASSERT(ele != nullptr,
"Encountered null owned row element while getting evaluation timers in discretization {}!",
name_);
my_eval_time_sum += ele->eval_time();
}

std::vector<double> eval_times(Core::Communication::num_mpi_ranks(get_comm()));
Core::Communication::gather_all(&my_eval_time_sum, eval_times.data(), 1, get_comm());
return eval_times;
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_rank_eval_times() sums only elerowptr_ (owned row elements), but element evaluation in Discretization::evaluate() iterates over my_col_element_range() (owned + ghost). If ghost elements are evaluated on a rank, the reported per-rank time will be systematically too small. Either sum over all locally evaluated elements (column range) or measure per-rank wall time directly around the evaluation loop instead of via per-element sums.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +108
// time the function wrapper separately if timing is enabled, to get per element evaluation
// times to avoid double counting evalate_with_timing should not be passed as element_action
const auto start_time = time_eval_time_ ? std::chrono::steady_clock::now()
: std::chrono::steady_clock::time_point{};

// call the element evaluate method
element_action(*actele, la, strategy.elematrix1(), strategy.elematrix2(), strategy.elevector1(),
strategy.elevector2(), strategy.elevector3());

// per element timer stop and record if timing is enabled
if (time_eval_time_)
actele->add_eval_time(
std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count());

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing is currently applied to the entire element_action callback, but this overload is also used for non-Element::evaluate() work (e.g., lambdas that assemble custom matrices). With time_eval_time_ enabled, this will pollute Element::eval_time_ with unrelated costs and distort the “evaluate wall time” output. Consider restricting timing to the actual Element::evaluate* paths (e.g., only in the default evaluate(params, strategy) implementation) or providing a dedicated API for timing element evaluation.

Copilot uses AI. Check for mistakes.
runtime_csvwriter_rank_eval_times_ =
new Core::IO::RuntimeCsvWriter(Core::Communication::my_mpi_rank(discret().get_comm()),
discret_ptr()->writer()->output(), "rank_eval_time");
runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16);
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeCsvWriter::register_data_vector() must be called with the number of components equal to the size of the vector you later write. Here it’s registered with numcomponents=1, but output_rank_eval_times() writes discret().get_rank_eval_times() which has length num_mpi_ranks. This will hit the writer’s size assertion on proc 0 for parallel runs. Please register with Core::Communication::num_mpi_ranks(discret().get_comm()) (or otherwise ensure the written vector size matches the registered component count).

Suggested change
runtime_csvwriter_rank_eval_times_->register_data_vector("RankEvalTime", 1, 16);
runtime_csvwriter_rank_eval_times_->register_data_vector(
"RankEvalTime", Core::Communication::num_mpi_ranks(discret().get_comm()), 16);

Copilot uses AI. Check for mistakes.
double* dt_ele_ptr_;

//! csv writer for per rank per timestep evaluation times
Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_csvwriter_rank_eval_times_ is managed as a raw pointer with manual new/delete in setup(), but there’s no visible ownership cleanup on object destruction and it’s easy to leak or double-delete in future changes. Prefer std::unique_ptr<Core::IO::RuntimeCsvWriter> (or std::optional<RuntimeCsvWriter>, as used elsewhere in the repo) to make ownership explicit and exception-safe.

Suggested change
Core::IO::RuntimeCsvWriter* runtime_csvwriter_rank_eval_times_;
std::unique_ptr<Core::IO::RuntimeCsvWriter> runtime_csvwriter_rank_eval_times_;

Copilot uses AI. Check for mistakes.
eval_data().set_action_type(Core::Elements::struct_calc_predict);
eval_data().set_predictor_type(pred_type);

discret().reset_element_eval_timers();
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_element_eval_timers() is called unconditionally here, which adds an O(#owned-elements) loop every timestep even when element timing/output is disabled. Consider guarding this with if (discret().time_eval_time()) (or the relevant IO flag) to avoid unnecessary overhead in normal runs.

Suggested change
discret().reset_element_eval_timers();
if (discret().time_eval_time())
discret().reset_element_eval_timers();

Copilot uses AI. Check for mistakes.
Comment on lines +669 to +679
void Core::FE::Discretization::reset_element_eval_timers()
{
for (auto* ele : elerowptr_)
{
FOUR_C_ASSERT(ele != nullptr,
"Encountered null owned row element while resetting evaluation timers in discretization "
"{}!",
name_);
ele->set_eval_time(0.0);
}
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These timers are reset only for elerowptr_ (owned row elements), but evaluation timing is accumulated on all local column elements (my_col_element_range()), including ghosts. This leaves ghost element timers stale and also makes later rank totals easy to undercount. Consider iterating over all_local_element_ids_/my_col_element_range() for reset, or ensure timing is only ever recorded on owned row elements.

Copilot uses AI. Check for mistakes.
Comment on lines +654 to +666
int Core::Elements::Element::evaluate_with_timing(Teuchos::ParameterList& params,
Core::FE::Discretization& discretization, LocationArray& la,
Core::LinAlg::SerialDenseMatrix& elemat1, Core::LinAlg::SerialDenseMatrix& elemat2,
Core::LinAlg::SerialDenseVector& elevec1, Core::LinAlg::SerialDenseVector& elevec2,
Core::LinAlg::SerialDenseVector& elevec3)
{
const auto start_time = discretization.time_eval_time() ? std::chrono::steady_clock::now()
: std::chrono::steady_clock::time_point{};
const int err = evaluate(params, discretization, la, elemat1, elemat2, elevec1, elevec2, elevec3);
if (discretization.time_eval_time())
eval_time_ +=
std::chrono::duration<double>(std::chrono::steady_clock::now() - start_time).count();
return err;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate_with_timing() uses std::chrono but this translation unit doesn’t include <chrono>, which will cause a compile error unless it happens to be pulled in indirectly. Please add an explicit #include <chrono> in this .cpp (or the header if you prefer).

Copilot uses AI. Check for mistakes.
bool havedof_;

//! Flag: time each element evaluation and store the result in the element
bool time_eval_time_;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time_eval_time_ is a new data member but it’s not default-initialized here. If a Discretization is used before set_time_eval_time() is called, this can lead to undefined behavior (randomly timing or not timing). Please initialize it to false (e.g., in-class initializer bool time_eval_time_{false}; or in the constructor initializer list).

Suggested change
bool time_eval_time_;
bool time_eval_time_{false};

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: LaSi5002 <84587043+LaSi5002@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants