GitHub - HaoBytes/MIRA: Medical Time Series Foundation Model

⚠️ This repository is no longer maintained.
Please visit the latest official implementation here:
👉 https://github.com/microsoft/MIRA

(NeurIPS '25) MIRA: Medical Time Series Foundation Model for Real-World Health Data

[Paper Page]

Overview

MIRA is a foundation model for medical time-series, designed to learn a unified representation space across heterogeneous clinical datasets and support zero-shot forecasting in real-world healthcare settings. Unlike conventional time-series models that operate on fixed sampling rates or task-specific feature spaces, MIRA is built to handle irregular and clinically diverse signals natively. By combining continuous-time encoding, frequency-aware specialization, and neural dynamics modeling, MIRA generalizes robustly across conditions.

MIRA is pretrained on 454B time points collected from large-scale clinical corpora spanning both ICU physiological signals and hospital EHR time-series, covering a rich range of sampling frequencies (minute-level vitals, hourly labs, waveform segments, and multi-day clinical indicators). This large and heterogeneous training distribution allows MIRA to serve as a unified backbone capable of strong out-of-distribution generalization. In extensive evaluations, MIRA achieves state-of-the-art zero-shot forecasting performance across diverse clinical benchmarks. Compared with existing foundation models, MIRA obtains SOTA results on 4 of 5 out-of-distribution evaluation settings on standard baselines—demonstrating strong robustness under dataset shift, irregular sampling, and multimodal temporal variations.

Key features

Continuous-Time Rotary Positional Encoding (CT-RoPE) Provides a principled way to embed irregular timestamps while preserving temporal geometry, enabling robust reasoning across arbitrary sampling patterns.
Frequency-specialized Mixture-of-Experts Allows different experts to specialize on physiological signs, improving transfer across diverse clinical signals.
Neural ODE Extrapolation Models latent dynamics continuously over time, enabling forecasting at arbitrary future timestamps.

Installation

Install Python 3.10+, and then install the dependencies:

pip install -r requirements.txt
pip install torchdiffeq

Note: MIRA requires torchdiffeq for ODE.

Data Preparation

Data format example

Each line represents one sample and must contain at least sequence and time fields:

{"sequence": [1.0, 1.2, 0.8, ...], "time": [0.12, 0.22, 0.41, ...], "mask": [1,1,1,...]}
{"sequence": [5.1, 5.0, 5.3, ...], "time": [1, 2.1, 3.1, ...], "mask": [1,1,1,...]}

Training

MIRA can be trained on either CPU or GPU environments. The training script automatically handles model initialization, dataset loading, and checkpointing. Below we provide example commands for common setups. For training on irregular medical data:

python torch_dist_run.py main.py \
  --from_scratch \
  -d ./yourdata.jsonl \
  --output_path ./saveyoucheckpoints \
  --save_steps 10000 \
  --save_strategy steps \
  --save_total_limit 10 \
  --save_only_model \
  --precision bf16 \
  --time_aware_dataset \
  --time_aware_rotary

CPU

If you prefer to train on CPU, simply point the script to your dataset directory:

python main.py -d <data_path>

GPU

The project includes a lightweight launcher that wraps PyTorch distributed training. On a machine with one or multiple GPUs, launch training via:

python torch_dist_run.py main.py -d <data_path>

For multi-node setups, standard PyTorch elastic variables must be configured.

export MASTER_ADDR=<master_addr>
export MASTER_PORT=<master_port>
export WORLD_SIZE=<world_size>
export RANK=<rank>
python torch_dist_run.py main.py -d <data_path>

To training from scratch, please include the --from_scratch argument in your command.

python torch_dist_run.py main.py -d <data_path> --from_scratch

For full argument list:

python main.py --help

Inference

Below is an exmaple how to doing inference.

import torch
from MIRA.mira.models.modeling_mira import MIRAForPrediction
from MIRA.mira.models.utils_time_normalization import normalize_time_for_ctrope

seq  = torch.tensor([[...]], dtype=torch.float32)      
time = torch.tensor([[...]], dtype=torch.float32)     

C = 12   # history length
P = 6    # forecast horizon
T = seq.shape[1]

attn = torch.ones_like(time)

full_scaled_times, t_min, t_max = normalize_time_for_ctrope(
    time_values=time,
    attention_mask=attn,
    seq_length=T,
    alpha=1.0,
)

hist_times   = full_scaled_times[:, :C]
future_times = full_scaled_times[:, C:C+P]

mean = seq.mean(dim=1, keepdim=True)
std  = seq.std(dim=1, keepdim=True) + 1e-6

seq_norm  = (seq - mean) / std
hist_vals = seq_norm[:, :C]

ckpt_path = "/checkpoint"
model = MIRAForPrediction.from_pretrained(ckpt_path).cuda()
model.eval()

device = next(model.parameters()).device
hist_vals    = hist_vals.to(device)
hist_times   = hist_times.to(device)
future_times = future_times.to(device)

cur_vals  = hist_vals.clone()
cur_times = hist_times.clone()

preds_norm = []

for i in range(P):

    # model input
    inp_vals  = cur_vals.unsqueeze(-1)   # [1, L, 1]
    inp_times = cur_times                # [1, L]

    with torch.no_grad():
        out = model(
            input_ids=inp_vals,
            time_values=inp_times,
            next_target_time_values=None,  # no ODE for 1-step
            return_dict=True,
        )

    next_norm = out.logits[:, -1, :]     # [1, 1]
    preds_norm.append(next_norm.squeeze(0))

    next_t = future_times[:, i:i+1]

    cur_vals  = torch.cat([cur_vals, next_norm], dim=1)
    cur_times = torch.cat([cur_times, next_t], dim=1)


preds_norm = torch.stack(preds_norm, dim=1)   # [1, P]

preds = preds_norm * std[:, :, :] + mean[:, :, :]
preds = preds.squeeze(0)
print(preds)

You can also refer to

python model_eval.py

Datasets

Note: All datasets used in this project are clinical or physiological time-series datasets. Because these datasets contain sensitive human subject information, they are governed by strict data-use agreements (DUA) and protected access policies. Therefore, the raw datasets cannot be redistributed in this repository. You must apply for access through the official data providers listed below.

MIMIC — Access link: https://physionet.org/content/mimiciv/
WAVES Pediatric Waveform Database — Access link: https://redivis.com/WAVES/datasets
PTB-XL — Access link: https://physionet.org/content/ptb-xl/1.0.3/
Sleep-EDF — Access link: https://physionet.org/content/sleep-edfx/1.0.0/

Performance

Out-of-Distribution Generalization

Generalization is essential for real-world medical AI systems. Unlike domain-specific time-series models that require retraining, MIRA provides zero-shot forecasting capabilities on new hospitals, new patient cohorts, and new physiological variables—without any fine-tuning.

This makes MIRA particularly suitable for:

health systems with limited labeled data
rapid deployment on unseen clinical tasks

To evaluate its robustness, we test MIRA on unseen downstream clinical datasets that do not overlap with the 454B time points used during pre-training (covering ICU physiological waveforms and hospital EHR time-series). The figure reports the average RMSE across a diverse collection of medical forecasting tasks, comparing MIRA against recent foundation models. MIRA achieves the best overall OOD performance, outperforming all baselines on out-of-distribution settings.

Frequency-Specialized Mixture-of-Experts (MoE)

To understand how MIRA allocates computation across different temporal resolutions, we visualize the expert gating patterns on two datasets with distinct time frequencies:

CDC-IHA — weekly epidemiological signals (low-frequency)
MIT-BIH Arrhythmia — ~250Hz ECG waveforms (high-frequency)

CDC-IHA (Weekly)

MIT-BIH (High-Frequency)

The MoE module in MIRA exhibits clear selective activation, showing that different experts specialize in different temporal regimes. The gating heatmap shows that CDC-IHA predominantly activates a small, consistent subset of experts across layers. This reflects low-frequency specialization, where long-horizon patterns dominate and only a few experts handle the smoother temporal dynamics. In contrast, MIT-BIH exhibits much more distributed expert routing, with activations spread across many experts and layers. This corresponds to modeling fine-grained, high-resolution physiological waveforms.

Citation

Please let us know if you find out a mistake or have any suggestions!

If you find the MIRA models helpful in your research, please consider to star this repository and cite the corresponding paper:

@article{li2025mira,
  title={MIRA: Medical Time Series Foundation Model for Real-World Health Data},
  author={Li, Hao and Deng, Bowen and Xu, Chang and Feng, Zhiyuan and Schlegel, Viktor and Huang, Yu-Hao and Sun, Yizheng and Sun, Jingyuan and Yang, Kailai and Yu, Yiyao and others},
  journal={arXiv preprint arXiv:2506.07584},
  year={2025}
}

Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

Time-MoE [repo]
Time-LLM [repo]
TimeMixer [repo]
Time-Series-Library [repo]
Large (Language) Models and Foundation Models (LLM, LM, FM) for Time Series and Spatio-Temporal Data [repo]

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
images		images
mira		mira
NOTICE.md		NOTICE.md
README.md		README.md
main.py		main.py
model_eval.py		model_eval.py
requirements.txt		requirements.txt
torch_dist_run.py		torch_dist_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(NeurIPS '25) MIRA: Medical Time Series Foundation Model for Real-World Health Data

Overview

Installation

Data Preparation

Data format example

Training

CPU

GPU

Inference

Datasets

Performance

Out-of-Distribution Generalization

Frequency-Specialized Mixture-of-Experts (MoE)

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

HaoBytes/MIRA

Folders and files

Latest commit

History

Repository files navigation

(NeurIPS '25) MIRA: Medical Time Series Foundation Model for Real-World Health Data

Overview

Installation

Data Preparation

Data format example

Training

CPU

GPU

Inference

Datasets

Performance

Out-of-Distribution Generalization

Frequency-Specialized Mixture-of-Experts (MoE)

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages