Skip to content
View LiqiangJing's full-sized avatar

Block or report LiqiangJing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. DSBench DSBench Public

    [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?

    Jupyter Notebook 116 10

  2. bcdnlp/FAITHSCORE bcdnlp/FAITHSCORE Public

    FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models

    Python 33 7

  3. benchflow-ai/skillsbench benchflow-ai/skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    PDDL 1.1k 263

  4. TIGER-AI-Lab/Pixel-Reasoner TIGER-AI-Lab/Pixel-Reasoner Public

    Pixel-Level Reasoning Model trained with RL [NeuIPS25]

    Python 293 11

  5. harbor-framework/terminal-bench-3 harbor-framework/terminal-bench-3 Public

    🚧 Accepting Task Submissions 🚧

    Python 134 142

  6. du-nlp-lab/FIFA du-nlp-lab/FIFA Public

    FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

    Python 1