环境参考

data:2026-01-13
非常好的初学者rag项目，本人小白也完全看的懂，这里分享下环境配置(参考gpt5.2)，供快速实现，另项目中config还得修改一下模型配置：
```
# requirements.txt
# Python 版本：推荐 3.10 / 3.11（3.12 需要确保 torch / torchvision / modelscope / faiss 等依赖有对应平台的可用轮子）
#
# 说明：
# - Windows 下 `faiss` 建议使用 conda 安装（pip 往往没有可用轮子），例如：conda install -c conda-forge faiss-cpu


# 基础工具
numpy>=1.23,<2
tqdm>=4.66
loguru>=0.7

# 检索 / 分词 / 传统特征
jieba>=0.42.1
nltk>=3.8
scikit-learn>=1.3,<2
scipy<2

# 向量检索（faiss）
#faiss-cpu>=1.7.4; platform_system != "Windows" # Windows环境下需考虑兼容性，这里推荐conda install

# Embedding / Rerank / LLM（本地模型）
# torch>=2.0 # torch 官网选定下载
torchvision>=0.15
transformers==4.48.3
huggingface-hub>=0.25,<0.26
accelerate>=0.26
sentence-transformers>=2.6

# 句子切分（ModelScope）
#modelscope>=1.15 #注：项目中默认不使用ModelScope的，这里可以不安装，我尝试安装，总缺依赖，我干脆给它修改了在./tinyrag/sentence_splitter.py
# def __init__(self, 
#                  use_model: bool = False, 
#                  sentence_size = 256,
#                  model_path: str = "damo/nlp_bert_document-segmentation_chinese-base", 
#                  device="cpu"
#         ):
#         self.sentence_size = sentence_size
#         self.use_model = use_model
#         if self.use_model:
#             try:
#                 from modelscope.pipelines import pipeline
#             except ModuleNotFoundError as e:
#                 raise ModuleNotFoundError(
#                     "已启用句子切分模型(use_model=True)，但当前环境缺少 modelscope 或其依赖。"
#                     "请安装 modelscope 及其依赖。"
#                 ) from e
#             # assert model_path == "" "模型路径为空"
#             self.sent_split_pp = pipeline(
#                 task="document-segmentation",
#                 model=model_path,
#                 device=device
#             )

# 文档解析
PyMuPDF>=1.23
python-docx>=1.1
python-pptx>=0.6.23
markdown>=3.6
beautifulsoup4>=4.12
Pillow>=10.0

# 在线 API（可选）
openai>=1.0
zhipuai>=2.0
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

环境参考 #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

环境参考 #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions