Popular repositories Loading
-
-
fastllm-glm47-dev
fastllm-glm47-dev PublicForked from ztxz16/fastllm
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
C++
-
ktransformers
ktransformers PublicForked from kvcache-ai/ktransformers
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Python
-
reap-step3p5
reap-step3p5 PublicForked from CerebrasResearch/reap
REAP: Router-weighted Expert Activation Pruning for SMoE compression
Python
-
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.