Skip to content

运行报错 Novelty check requires embeddings, but embedding is unavailable #27

@zaku1521

Description

@zaku1521

报错信息如下:
✅ Story终稿已根据Reflection建议生成

🔄 迭代轮次: 2/3

================================================================================
🔍 Phase 3: Multi-Agent Critic (多智能体评审 - Anchored)

📝 Reviewer A (Methodology) 评审中...
评分: 7.1/10
反馈: 【实际输出内容省略】.

📝 Reviewer B (Novelty) 评审中...
评分: 7.1/10
反馈: 【实际输出内容省略】.

📝 Reviewer C (Storyteller) 评审中...
评分: 6.9/10
反馈: 【实际输出内容省略】.

📊 诊断信息:
分数分布: [7.099999999999892, 7.109999999999892, 6.909999999999896]
最低分评审员: Reviewer C (Storyteller), 分数: 6.909999999999896


📊 评审结果: 平均分 7.04/10 - ✅ PASS

🏆 更新全局最佳版本: 得分 7.04 (迭代 2)

✅ 评审通过,进入查重验证阶段

❌ 错误: Novelty check requires embeddings, but embedding is unavailable
Traceback (most recent call last):
File "F:\software\ChatGPT\Idea2Paper-main\Paper-KG-Pipeline\scripts\idea2story_pipeline.py", line 304, in main
result = pipeline.run()
File "F:\software\ChatGPT\Idea2Paper-main\Paper-KG-Pipeline\src\idea2paper\application\pipeline\manager.py", line 512, in run
raise RuntimeError("Novelty check requires embeddings, but embedding is unavailable")
RuntimeError: Novelty check requires embeddings, but embedding is unavailable


环境配置:已经将 Hugging Face 上 paper-embedding 中的两个文件夹放入 paper-KG-Pipeline/output


.env.example 文件内容如下:

LLM_API_URL=https://api.siliconflow.cn/v1/chat/completions
LLM_MODEL=Pro/zai-org/GLM-4.7

-----------------------------

Embedding (optional overrides)

-----------------------------

If not set, Embedding uses:

- EMBEDDING_API_URL=https://api.siliconflow.cn/v1/embeddings

- EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B

- EMBEDDING_API_KEY falls back to SILICONFLOW_API_KEY

Tip: For frequent switching, set I2P_INDEX_DIR_MODE=auto_profile to auto-select

per-embedding index dirs (no manual profile scripts needed). You can still override

I2P_NOVELTY_INDEX_DIR / I2P_RECALL_INDEX_DIR if you prefer.

EMBEDDING_API_URL=https://api.siliconflow.cn/v1/embeddings
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
EMBEDDING_API_KEY=your_embedding_key_here

Optional: auto profile index directories

I2P_INDEX_DIR_MODE=auto_profile

-----------------------------

Run logging (repo root log/)

-----------------------------

1 = enable structured run logs under log/run_.../

0 = disable run logs (pipeline still runs)

I2P_ENABLE_LOGGING=1

Optional: override log output directory (absolute path recommended)

I2P_LOG_DIR=/abs/path/to/log

Optional: max chars saved for prompt/response per call (avoid huge JSONL)

I2P_LOG_MAX_TEXT_CHARS=20000

-----------------------------

Results bundling (repo root results/)

-----------------------------

1 = enable bundling final artifacts under results/run_.../

0 = disable bundling (pipeline still runs)

I2P_RESULTS_ENABLE=1

Bundling mode: link (preferred) or copy

- link: create symlink if possible, fallback to copy

- copy: always duplicate files

I2P_RESULTS_MODE=link

-----------------------------

Critic strictness (quality)

-----------------------------

1 = strict JSON mode (quality-first): critic JSON invalid -> retry -> still invalid => fail the run

0 = allow non-strict behavior (useful for offline smoke tests when no API key)

I2P_CRITIC_STRICT_JSON=1

How many retries after the first failure (default 2)

I2P_CRITIC_JSON_RETRIES=2

-----------------------------

Pass rule (pattern-aware)

-----------------------------

Default is the objective "Scheme B":

- at least 2 of 3 role scores >= pattern q75

- and avg_score >= pattern q50

If pattern has too few papers (see I2P_PASS_MIN_PATTERN_PAPERS), fallback is controlled by I2P_PASS_FALLBACK.

I2P_PASS_MODE=two_of_three_q75_and_avg_ge_q50

I2P_PASS_MIN_PATTERN_PAPERS=20

I2P_PASS_FALLBACK=global # global|fixed

I2P_PASS_SCORE=7.0 # only used when fallback=fixed or distribution unavailable

-----------------------------

Advanced: anchors & scoring

-----------------------------

Quantiles for the 5 fixed anchors (comma-separated floats)

I2P_ANCHOR_QUANTILES=0.1,0.25,0.5,0.75,0.9

I2P_ANCHOR_MAX_INITIAL=7

I2P_ANCHOR_MAX_TOTAL=9

I2P_ANCHOR_MAX_EXEMPLARS=2

I2P_DENSIFY_OFFSETS=-0.5,0.5,-0.25,0.25

I2P_SIGMOID_K=1.2

I2P_GRID_STEP=0.01

I2P_DENSIFY_LOSS_THRESHOLD=0.03

I2P_DENSIFY_MIN_AVG_CONF=0.45

I2P_ANCHOR_DENSIFY_ENABLE=0 # disable adaptive densify to reduce latency

-----------------------------

Local novelty check (A方案)

-----------------------------

Enable local novelty check against nodes_paper.json

I2P_NOVELTY_ENABLE=1

Do NOT auto-build novelty index during run (quality-first + predictable)

I2P_NOVELTY_AUTO_BUILD_INDEX=1

Offline build batch size

I2P_NOVELTY_INDEX_BUILD_BATCH_SIZE=32

Action on high similarity: report_only | pivot | fail

I2P_NOVELTY_ACTION=pivot

Max pivot attempts when similarity is high

I2P_NOVELTY_MAX_PIVOTS=2

-----------------------------

Index auto-prepare (one-command run)

-----------------------------

1 = auto-preflight and build missing indexes; 0 = skip preflight

I2P_INDEX_AUTO_PREPARE=1

1 = allow auto-build when missing; 0 = fail and ask for manual build

I2P_INDEX_ALLOW_BUILD=1

-----------------------------

Final collision threshold (Phase 4)

-----------------------------

1 = enable final verification (Phase 4), 0 = skip

I2P_VERIFICATION_ENABLE=1

Recommendation: set between novelty.medium_th and novelty.high_th (e.g. 0.82~0.88)

I2P_COLLISION_THRESHOLD=0.88

-----------------------------

Recall audit (persist recall candidates)

-----------------------------

1 = enable recall audit, 0 = disable

I2P_RECALL_AUDIT_ENABLE=1

Top-N pattern scores per path to persist

I2P_RECALL_AUDIT_TOPN=50

Recall embedding batch params

I2P_RECALL_EMBED_BATCH_SIZE=32
I2P_RECALL_EMBED_MAX_RETRIES=3
I2P_RECALL_EMBED_SLEEP_SEC=0.5

Recall offline index (optional)

I2P_RECALL_USE_OFFLINE_INDEX=1


i2p_config.json文件,没有修改任何内容

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions