Skip to content

[BUG] Possible memory leak in mtt eval #1014

@ppegolo

Description

@ppegolo

Summary

Evaluating a PET-OMAT-S-like model on a large-ish dataset (MATPES) I go OOM on H100 at around 100k structures evaluated, irrespective of the batch size (even when it's smaller than the one used during training).

Expected behavior

For appropriate batch sizes the evaluation should be carried out successfully.

Actual behavior

OOM.

Version

2026.1

Steps to reproduce

No response

Further information, files, and links

Used eval.yaml file:

systems:
  read_from: ../../dataset/pbe/training.xyz
  length_unit: angstrom
targets:
  energy:
    key: energy
    unit: eV
    forces:
      key: forces
    stress:
      key: stress
  non_conservative_forces:
    key: forces
    quantity: force
    unit: eV/A
    per_atom: true
    type:
      cartesian:
        rank: 1
  non_conservative_stress:
    key: stress
    quantity: pressure
    unit: eV/A^3
    type:
      cartesian:
        rank: 2

Run with mtt eval model.pt eval.yaml -b 16 -o matpes_training_eval.xyz

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions