Skip to content

How to perform image-text understanding inference directly with OpenUni? #9

@showstarpro

Description

@showstarpro

First of all, I would like to sincerely thank the OpenUni team for fully open-sourcing such a unified multimodal model for both understanding and generation. This is truly valuable for the community.

I have tested the internvl3_2b_sana_1_6b_512_hf_blip3o60k checkpoint and achieved generation results that are close to those reported in the paper.

However, I am not sure how to directly perform image-text understanding inference (e.g., MMBench, MMMU, MMStar, etc.) using OpenUni.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions