-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
data:2026-01-13
非常好的初学者rag项目,本人小白也完全看的懂,这里分享下环境配置(参考gpt5.2),供快速实现,另项目中config还得修改一下模型配置:
# requirements.txt
# Python 版本:推荐 3.10 / 3.11(3.12 需要确保 torch / torchvision / modelscope / faiss 等依赖有对应平台的可用轮子)
#
# 说明:
# - Windows 下 `faiss` 建议使用 conda 安装(pip 往往没有可用轮子),例如:conda install -c conda-forge faiss-cpu
# 基础工具
numpy>=1.23,<2
tqdm>=4.66
loguru>=0.7
# 检索 / 分词 / 传统特征
jieba>=0.42.1
nltk>=3.8
scikit-learn>=1.3,<2
scipy<2
# 向量检索(faiss)
#faiss-cpu>=1.7.4; platform_system != "Windows" # Windows环境下需考虑兼容性,这里推荐conda install
# Embedding / Rerank / LLM(本地模型)
# torch>=2.0 # torch 官网选定下载
torchvision>=0.15
transformers==4.48.3
huggingface-hub>=0.25,<0.26
accelerate>=0.26
sentence-transformers>=2.6
# 句子切分(ModelScope)
#modelscope>=1.15 #注:项目中默认不使用ModelScope的,这里可以不安装,我尝试安装,总缺依赖,我干脆给它修改了在./tinyrag/sentence_splitter.py
# def __init__(self,
# use_model: bool = False,
# sentence_size = 256,
# model_path: str = "damo/nlp_bert_document-segmentation_chinese-base",
# device="cpu"
# ):
# self.sentence_size = sentence_size
# self.use_model = use_model
# if self.use_model:
# try:
# from modelscope.pipelines import pipeline
# except ModuleNotFoundError as e:
# raise ModuleNotFoundError(
# "已启用句子切分模型(use_model=True),但当前环境缺少 modelscope 或其依赖。"
# "请安装 modelscope 及其依赖。"
# ) from e
# # assert model_path == "" "模型路径为空"
# self.sent_split_pp = pipeline(
# task="document-segmentation",
# model=model_path,
# device=device
# )
# 文档解析
PyMuPDF>=1.23
python-docx>=1.1
python-pptx>=0.6.23
markdown>=3.6
beautifulsoup4>=4.12
Pillow>=10.0
# 在线 API(可选)
openai>=1.0
zhipuai>=2.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels