xzf-thu

xzf-thu

Achievements

gpt-omni/mini-omni gpt-omni/mini-omni Public

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3.5k 307
gpt-omni/mini-omni2 gpt-omni/mini-omni2 Public

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1.9k 204
Audio-Reasoner Audio-Reasoner Public

The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.

Python 285 24
Mini-Omni-Reasoner Mini-Omni-Reasoner Public

Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …

163 19