Retrieval-Augmented Generation (RAG) systems commonly suffer from Knowledge Conflicts, where retrieved external knowledge contradicts the inherent, parametric knowledge of large language models (LLMs). It adversely affects performance on downstream tasks such as question answering (QA). Existing approaches often attempt to mitigate conflicts by directly comparing two knowledge sources in a side-by-side manner, but this can overwhelm LLMs with extraneous or lengthy contexts, ultimately hindering their ability to identify and mitigate inconsistencies.
To address this issue, we propose Micro-Act, a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons. These comparisons are represented as actionable steps, enabling reasoning beyond the superficial context. Through extensive experiments on five benchmark datasets, Micro-Act consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types, especially in temporal and semantic types where all baselines fail significantly. More importantly, Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.
Micro-Act introduces a novel hierarchical action space framework that addresses knowledge conflicts in RAG systems through:
- Adaptive Context Decomposition: Automatically breaks down complex knowledge sources into manageable, fine-grained comparisons
- Hierarchical Action Space: Implements a structured approach with actions like
decompose,knowledge_gen,reason, andterminate - Multi-Step Reasoning: Enables iterative reasoning through multiple action steps to resolve conflicts
- Conflict Type Handling: Specifically designed to handle temporal, semantic, and misinformation conflicts
- Python 3.10+ (recommended)
- CUDA-compatible GPU (for GPU inference)
- OpenAI API access (for API-based inference)
- Conda or Miniconda (recommended for environment management)
- Clone the repository:
git clone <repository-url>
cd <repo-path>- Create and activate conda environment using the provided
requirements.yml:
conda env create -f requirements.yml
conda activate QA_conflict- Clone the repository:
git clone <repository-url>
cd <repo-path>- Create a new conda environment:
conda create -n micro_act python=3.10
conda activate micro_act- Install core dependencies:
pip install torch==2.4.0 transformers==4.43.2 vllm==0.6.1.post2 openai==1.47.0 tqdm==4.66.2 pandas==2.2.0 tabulate==0.9.0- Install additional dependencies for full functionality:
pip install accelerate==0.27.2 bitsandbytes==0.43.1 flash-attn==2.5.8 sentencepiece==0.2.0 tokenizers==0.19.1Set up your OpenAI API credentials:
export OPENAI_API_KEY="your-api-key"
export OPENAI_API_BASE="your-api-base-url" # Optional, for Azure OpenAIEnsure you have CUDA installed and compatible with PyTorch 2.4.0. The framework supports:
- CUDA 12.1+ (recommended)
- Multiple GPU inference with tensor parallelism
- vLLM for efficient large model inference
<repo-path>/
├── code/ # Core implementation
│ ├── infer_gpu_action.py # GPU-based Micro-Act inference
│ ├── infer_gpu_baseline.py # GPU-based baseline inference
│ ├── infer_gpu_react.py # GPU-based ReAct inference
│ ├── infer_api_action.py # API-based Micro-Act inference
│ ├── infer_api_baseline.py # API-based baseline inference
│ ├── infer_api_react.py # API-based ReAct inference
│ └── eval.py # Evaluation script
├── datasets/ # Benchmark datasets
│ ├── conflictBank/ # ConflictBank dataset
│ ├── KRE_musique/ # KRE-Musique dataset
│ ├── KRE_squad/ # KRE-SQuAD dataset
│ └── data_sampler.py # Data sampling utilities
├── prompts/ # Prompt templates
│ └── prompts_se.py # Structured prompts for actions
├── run/ # Execution scripts
│ ├── run_conflictBank/ # ConflictBank experiments
│ └── run_KRE/ # KRE experiments
├── mid_results/ # Intermediate results
├── results/ # Final evaluation results
└── requirements.yml # Conda environment specification
- API-based inference (using GPT-4o):
cd run/run_conflictBank
bash run_gpt_4o_react.sh- GPU-based inference (using local models):
cd run/run_conflictBank
bash run_llama_31_70B_react.sh- Evaluation:
bash eval.sh- Description: Multi-choice QA dataset with knowledge conflicts
- Conflict Types: Temporal, Semantic, Misinformation
- Format: JSONL with questions, options, and evidence
- Description: QA dataset with knowledge conflicts
- Format: JSONL with questions, choices, and contexts
- Description: Reading comprehension dataset with knowledge conflicts
- Format: JSONL with questions, choices, and contexts
- GPT-4o
- GPT-4o-mini
- Other OpenAI-compatible models
- Llama-3.1-8B
- Llama-3.1-70B
- Other HuggingFace models compatible with vLLM
If you use Micro-Act in your research, please cite:
@article{huo2025micro,
title={Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning},
author={Huo, Nan and Li, Jinyang and Qin, Bowen and Qu, Ge and Li, Xiaolong and Li, Xiaodong and Ma, Chenhao and Cheng, Reynold},
journal={arXiv preprint arXiv:2506.05278},
year={2025}
}For questions and issues, please open an issue on GitHub or contact huonan@connect.hku.hk.