Skip to content

TrustAIRLab/Unsafe-LLM-Based-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official repository for "Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search".

License: Apache 2.0 Python 3.10+ arXiv

Teaser Preview

Introduction

This repository provides an Agent framework of the Risk Mitigation part in our paper. The XGBoost-detector and PhishLLM-detector are for comparison. The code for the PhishLLM-detector can be found at: https://github.com/code-philia/PhishLLM

Project Structure

agent_defense/
├── src/
│   └──agent.py                     # build_agent
│   └──prompt.py                    # prompt
│   └──tools.py                     # tool calling (You could change the tools by modifying the `return_tools` function; the HtmlLLM-detector's prompt can be found in the `is_malicious` function.)
│   └──utils.py                     # XGBoost-detector method
│   └──selenium_fetcher.py          # HtmlLLM-detector method for getting HTML content (optional)
│   └──template.csv                 # template for basic test
│   └──XGBoostClassifier.pickle.dat # XGBoost-detector model weight
├── template.json                   # template for basic test
├── requirement.txt                   # required packages, use `pip install -r requirement.txt` to install
├── prompt_defense.py               # prompt-based defense code
└── main.py                         # run the defense (It uses the HtmlLLM-detector (ours) by default.)

How to Run

Setup

  1. Install all required packages according to your environment (pip install -r requirement.txt).
  2. Register OpenAI API Key, See Tutorial here. Paste the API key to './src/openai.txt'.

For Batch Comparison (Shown in Our Paper)

  1. Prepare the batch_result.csv in the format below (You need to use the is_malicious function to obtain the results and write them to this CSV file for batch comparison):

    phish_prediction is the result of the PhishLLM-detector, while malicious is the result of our method, the HtmlLLM-detector.

    url,phish_prediction,malicious
    https://example0.com,benign,False
    https://example1.com,benign,True
    
  2. Prepare the input.json

    [
        {
            "LLM": "The platform name",
            "Query": "The Query",
            "Risk": "main",
            "content": {
                "output": "The output of AIPSE",
                "resource": [
                    "https://example0.com",
                    "https://example1.com"
                ]
            }
        }
    ]
  3. BasicTest Run

    We provide all template files. To run a basic test, you can simply run:

    python main.py
    python prompt_defense.py

    after entering the parameters in the main.py, tools.py, and prompt_defense.py files.

    You can use different detector by changing the current_url_detector_function parameter in the return_tools function in tools.py file. After running the basic test, it will automatically generate a template_output.json file for verification.

For Single Query

You can directly test it by changing the return_tools function in tools.py.

Note:

The domains used in our case study will expire a few days after January 4, 2026. We have archived their content via the Wayback Machine. Please refer to the list below for details about the archived webpages:

These domains are no longer under our control and have been released back into the domain market. As such, we are no longer responsible for their content or any communications originating from them.

⚠️ Caution: Any messages or information sent from these domains do not represent us anymore.

Citation

@inproceedings{UnsafeSearch2025,
      title={Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search}, 
      author = {Zeren Luo and Zifan Peng and Yule Liu and Zhen Sun and Mingchen Li and Jingyi Zheng and Xinlei He},
      booktitle = {{34th USENIX Security Symposium (USENIX Security 25)}},
      publisher = {USENIX},
      year = {2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages