LeapLabTHU · lukelike1001 · Jul 11, 2024 · Jul 11, 2024 · Jul 11, 2024 · Jul 16, 2024
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,12 @@ __pycache__/
 *.py[cod]
 *$py.class
 
+# extra files for testing code
+extra/
+
+# data
+data/alfworld
+
 # C extensions
 *.so
 

diff --git a/README.md b/README.md
@@ -1,76 +1,10 @@
-<img src="assets/header_baby.png" alt="Smoll baby robot" width="auto" height="80">  | <span style="font-size: 25px;">ExpeL: LLM Agents are Experiential Learners</span>
-:---:|:---:
+# **ExpeL_COA** 🪖
+ExpeL agent fitted to run course-of-action (COA) benchmark tests for planning military plans on a simulated battlefield.
 
+## **Setting up Your Virtual Environment: 🌎**
+Creating a conda environment for ExpeL is the same process described by the original repository.
 
-
-⚡ [AAAI 2024 *(Oral)*] Official implementation of the ExpeL Agent ⚡
-
-~ by Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang ~
-
-
-[![Release Notes](https://img.shields.io/github/release/LeapLabTHU/ExpeL)](https://github.com/LeapLabTHU/ExpeL/releases)
-![License: Apache 2.0](https://img.shields.io/github/license/LeapLabTHU/ExpeL)
-[![GitHub star chart](https://img.shields.io/github/stars/LeapLabTHU/ExpeL?style=social)](https://star-history.com/#LeapLabTHU/ExpeL)
-[![Open Issues](https://img.shields.io/github/issues-raw/LeapLabTHU/ExpeL)](https://github.com/LeapLabTHU/ExpeL/issues)
-
----
-### 🌐 $\cdot$ [Project Page](https://andrewzh112.github.io/#expel) &ensp; 📄 $\cdot$ [Paper](https://arxiv.org/pdf/2308.10144.pdf)
-
-> "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." - Tom Mitchell
-
-# 📖 Table of Contents
-
-<div style="display: flex;">
-
-  <!-- Left Column - List -->
-  <div style="flex: 1; padding-right: 1px;">
-
-[👋 Introduction](#-introduction)
-
-[🛠️ Installation](#%EF%B8%8F-installation)
-
-- [🌳 Environments](#-environments)
-  - [🏠 ALFWorld](#-alfworld)
-  - [🛒 Webshop](#-webshop)
-
-[🚀 Quick start](#-quick-start)
-1. [Experience Gathering](#1-for-the-experience-gathering-stage)
-2. [Insights Extraction](#2-for-the-insights-extraction-stage)
-3. [Evaluation](#3-for-evaluation)
-
-[🫡 Cite us !](#-cite-us-)
-
-[💌 Contact us !](#-contact-us-)
-
-[🏛️ License](#%EF%B8%8F-license)
-
-[⚠️ Issues](#%EF%B8%8F-issues)
-
-</div>
-
-  <!-- Right Column - Image -->
-  <div style="flex: 2;">
-
-  <img src="assets/expel.png" alt="Smoll baby robot" width="auto" height="auto">
-
-  </div>
-
-</div>
-
-
-
-## 👋 Introduction
-
-This repo is the official implementation of [Expel: LLM Agents are Experiential Learners](https://arxiv.org/pdf/2308.10144.pdf). 
-
-Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results highlight the robust learning efficacy of the ExpeL agent, indicating a consistent enhancement in its performance as it accumulates experiences.
-
-## 🛠️ Installation
-Python version : 3.9.17
-
-1. Create a virtual environment using [Anaconda](https://anaconda.org/anaconda/python) (or your favorite package manager), activate it, clone the repo and install the requirements.
-
-```sh
+```
 conda create -n expel python=3.9.17
 conda activate expel
 
@@ -80,218 +14,44 @@ cd expel
 pip install -r requirements.txt
 ```
 
-Next you need to setup the environments.
-
-## 🌳 Environments
-
-Baby ExpeL has been playing around with the following environments: 
-
-- ❓[HotpotQA](https://github.com/hotpotqa/hotpot)
-- 🏠 [ALFWorld](https://github.com/alfworld/alfworld)
-- 🛒 [WebShop](https://github.com/princeton-nlp/WebShop)
-- 🌡️ [FEVER](https://github.com/awslabs/fever)
-
-Among these, ALFWorld and WebShop require manual installation (+ loading a server (can be local) for WebShop). Details below:
-
-### 🏠 ALFWorld
-The installation instructions are shown below. Use the previously created environment to install ALFWorld.
-You will also need to download the data at the specified location: ``data/alfworld``.
-```Bash
-conda activate expel
-pip install alfworld[full]
-
-export ALFWORLD_DATA="data/alfworld"
-alfworld-download
-```
-If you need more details, please refer to the [official repo](https://github.com/alfworld/alfworld#quickstart).
-
-### 🛒 WebShop
-
-WebShop installation is different from the other environments. You will have to install it and **manually** run the server (can be local) in parallel of ExpeL to interact with the environment.
-The succinct installation instructions are shown below.
-
-```bash
-git clone https://github.com/princeton-nlp/webshop.git webshop
-cd webshop
-
-# Create another env for the webshop server to avoid conflicts
-conda create -n webshop python=3.8.13 
-conda activate webshop
-
-./setup.sh -d all
-```
-
-By default the WebShop only loads 1,000 products. But we need <u>ALL OF THEM</u> (🤯). So change ``web_agent_site/utils.py``:
-
-```python
-# DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2_1000.json')
-# DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle_1000.json')
-DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2.json')
-DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle.json')
-```
-To run the server, run the following command:
-```bash
-./run_dev.sh
-```
-You will be given an URL (and port) once the website is on:
-- Go back to the cloned ExpeL repo
-- Modify the config file and add the given URL in ``envs/webshop/webshop.py``:
-```python
-WEBSHOP_URL = "http://127.0.0.1:3000" # Example URL
-```
-
-Note that you will have to run the WebShop server in the background to interact with the environment. We gathered some bugs we encountered during the WebShop Server setup [here](#issues).
-
-If you need more details, please refer to the [official repo](https://github.com/princeton-nlp/WebShop?tab=readme-ov-file#-setup).
-
-
-## 🚀 Quick start
-
-Below are the commands to run the ExpeL Agent.
-
-**Either put your OpenAI API key in a ``.env`` file (OPENAI_API_KEY=XXX) or get prompted in command line**
-
-### 1. For the **Experience Gathering** stage:
-```bash
-python train.py benchmark=<benchmark-name> \
-  run_name=<train-run-name> \
-  testing=false \
-  resume=false
-
-# resume = true/false if you want to resume a previous run
-# benchmark = {hotpotqa, alfworld, webshop, fever}
-# agent.llm = {gpt-3.5-turbo (default), gpt-4}
-```
-Below are the commands to run the experience gathering stage as in the paper:
-```bash
-# 🏠 ALFWorld
-python train.py benchmark=alfworld run_name=<train-run-name> testing=false resume=false
-# 🛒 WebShop
-python train.py benchmark=webshop run_name=<train-run-name> testing=false resume=false
-# ❓ HotpotQA
-python train.py benchmark=hotpotqa run_name=<train-run-name> testing=false resume=false
-```
-
-By default, the result files (logs, dictionnaries) will be saved in ``logs/<benchmark-name>/expel`` referenced by ``<train-run-name>``. You can change the log directory by adding ``log_dir=<log-dir>`` to the command line.
-
-
-### 2. For the **Insights Extraction** stage:
-Use the collected experiences to extract insights.
-
-```bash
-python insight_extraction.py \
-  benchmark=<benchmark-name> \
-  load_run_name=<train-run-name> \
-  run_name=<insights-extraction-run-name> \ 
-  agent.llm=<model> \
-  agent.max_num_rules=<insights-num> \
-  agent.success_critique_num=<exp-num> \
-  testing=true \
-  resume=false
-
-# agent.success_critique_num = number of experiences to give per iteration
-# agent.max_num_rules = target number of insights to extract
-```
-
-To resume a run that stopped at a specific fold, remove ``load_run_name`` from the parameters and specify the fold ``resume_fold`` it stopped at and ``resume=true``.
-
-
-Below are the commands to run the insights extraction stage as in the paper:
-```bash
-# 🏠 ALFWorld
-python insight_extraction.py benchmark=alfworld load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=10 agent.success_critique_num=8 testing=false resume=false
-# 🛒 WebShop
-python insight_extraction.py benchmark=webshop load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=8 agent.success_critique_num=4 testing=false resume=false
-# ❓ HotpotQA
-python insight_extraction.py benchmark=hotpotqa load_run_name=<train-run-name> run_name=<insights-extraction-run-name> agent.llm=gpt-4 agent.max_num_rules=10 agent.success_critique_num=8 testing=false resume=false
-```
-
-The final result files will be saved in ``logs/<benchmark-name>/expel/extracted_insights`` referenced by ``<insights-extraction-run-name>``.
-
-
-### 3. For **Evaluation**:
-```bash
-python eval.py benchmark=<benchmark-name> \
-  load_run_name=extracted_insights/<insights-extraction-run-name> \
-  run_name=<eval-run-name> \
-  benchmark.eval_configs.k_folds=<fold-num> \
-  agent.fewshot_strategy=task_similarity \
-  agent.retrieval_kwargs.max_fewshot_tokens= <max-retrieval-token-size> \
-  agent.retrieval_kwargs.buffer_retrieve_ratio = <retrieve_multiplier-coefficient> \
-  testing=false \
-  resume=false
-
-# agent.fewshot_strategy = {task_similarity, thought_similarity,task_thought_similarity)
-# agent.llm = {gpt-3.5-turbo (default), gpt-4}
-# agent.retrieval_kwargs.max_fewshot_tokens=auto
-# benchmark.eval_configs.k_folds=2 
-# agent.retrieval_kwargs.buffer_retrieve_ratio = safety measure to not retrieve 0 examples (bigger is safer)
-```
-
-To resume a run that stopped, remove ``load_run_name`` from the parameters and add ``resume=true`` at the end of the command line.
-
-Below are the commands to evalute ExpeL as in the paper:
-
-```bash
-# 🏠 ALFWorld
-python eval.py benchmark=alfworld load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity agent.retrieval_kwargs.max_fewshot_tokens=auto testing=false resume=false
-# 🛒 WebShop
-python eval.py benchmark=webshop load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity agent.retrieval_kwargs.max_fewshot_tokens=auto agent.retrieval_kwargs.buffer_retrieve_ratio=20 testing=false resume=false
-# ❓ HotpotQA
- python eval.py benchmark=hotpotqa load_run_name=extracted_insights/<insights-extraction-run-name> run_name=<eval-run-name> agent.fewshot_strategy=task_similarity testing=false resume=false
-```
-The result files will be saved in ``logs/<benchmark-name>/expel/eval`` referenced by ``<eval-run-name>``.
-
-## 🫡 Cite us !
-
-This repository contains code for reproducing results. If you find this work useful in your research (and/or daily life), please cite:
-
-```
-@inproceedings{zhao2024expel,
-    author       = {Andrew Zhao and Daniel Huang and Quentin Xu and Matthieu Lin and Yong-Jin Liu and Gao Huang},
-    title        = {ExpeL: LLM Agents Are Experiential Learners},
-    booktitle    = {Thirty-Eighth {AAAI} Conference on Artificial Intelligence, {AAAI}
-                    2024, Thirty-Sixth Conference on Innovative Applications of Artificial
-                    Intelligence, {IAAI} 2024, Fourteenth Symposium on Educational Advances
-                    in Artificial Intelligence, {EAAI} 2024, February 20-27, 2024, Vancouver,
-                    Canada},
-    editor       = {Michael J. Wooldridge and Jennifer G. Dy and Sriraam Natarajan},
-    year         = {2024},
-    pages        = {19632--19642},
-    publisher    = {{AAAI} Press},
-    url          = {https://ojs.aaai.org/index.php/AAAI/article/view/29936},
-    doi          = {10.1609/aaai.v38i17.29936}
-}
-```
-
-## 💌 Contact us !
-
-If you have any questions, feel free to contact [Andrew Zhao](mailto:zqc21@mails.tsinghua.edu.cn), [Daniel Huang](mailto:huang-jy22@mails.tsinghua.edu.cn) or [Quentin Xu](mailto:quentinxu1@gmail.com).
-
-
-## 🏛️ License 
-Check `LICENSE.md`
-
-## ⚠️ Issues
-We encountered some errors and gathered them here (note that at time of reading, they might have been fixed). If you don't encountered them, lucky you 😒. 
-
-**🛒 WebShop**-server installation:
-
-```Bash
-# install 
-python -m spacy download en_core_web_lg #  
-pip install lightgbm nmslib # need to have c compiler and stuff
-conda install mkl # if ImportError: libmkl_intel_lp64.so.1:
-pip install pysernini
-pip install pyserini --no-cache-dir # if low on ram
-pip install typing-inspect==0.8.0 typing_extensions==4.5.0 # if issubclass errors 
-# if libjvm.so something, need to export JAVA_HOME
-./setup.sh -d all 
-```
-On Mac, if you have problem with lightgbm or nmslib you might have to replace their pip to:
-```Bash
-brew install cmake libomp 
-pip install lightgbm
-
-CFLAGS="-mavx -DWARN(a)=(a)" pip install --use-pep517 nmslib
-```
+## **How to Run COA Benchmark Tests: 🛩️**
+
+1. Specify your input `.joblib` `task_file` in [`coa.yaml`](configs/benchmark/coa.yaml)
+2. Set your OpenAI API key as follows: `export OPENAI_API_KEY=[YOUR_KEY_HERE]`
+3. Export your Llama URL as follows: `export LLAMA_URL=[YOUR_URL_HERE]`
+4. That's it... run any of the Python scripts in the root directory.
+
+## **Table of Contents: 📖**
+
+### **Key Files and Folders: 🐍🗂️**
+The files and folders summarized here have been modified from the original ExpeL benchmark to implement the COA tests. 
+
+- `configs/benchmark/`:
+  - [`coa.yaml`](configs/benchmark/coa.yaml):<br>
+    Stores config information about number of fewshot examples, input `.joblib` file, and k_folds variables.
+- `data/coa/`: 
+  - `[PREFIX].csv`:<br>
+    CSV files aren't used for running the COA benchmark, and serve as a visual aid to understand how the data is structured.
+  - `[PREFIX].joblib`:<br>
+    Input file that allows the LLM to generate COA plans given `supporting_information` regarding friendly troops and enemy units.
+- `envs/`:
+  - [`__init__.py`](envs/__init__.py):<br>
+    Specifies COA tasks and environment variables from the input `.joblib` file
+  - `coa/`: 
+    - [`battlefield_validation.py`](envs/coa/battlefield_validation.py):<br>
+      Features the `BattlefieldValidation` helper class that evaluates valid attack, move, and stand actions issued by the LLM.
+    - [`coa.py`](envs/coa/coa.py):<br>
+      Environment script that evaluates attack, move, and stand actions issued by the LLM. 
+- `logs/coa/expel/`:
+  - Stores experiences, logs, and evaluation graphs generated by ExpeL benchmark tests
+- `prompts/`:
+  - [`__init__.py`](prompts/__init__.py):<br>
+    Relays prompt variables for [`ExpelAgent`](agent/expel.py) initially defined in [`coa.py`](prompts/coa.py) to properly run the ExpeL pipeline files, such as [`train.py`](train.py)
+  - [`coa.py`](prompts/coa.py):<br>
+    Defines prompt variables for [`ExpelAgent`](agent/expel.py)
+
+## **Classic ExpeL: 💭**
+As mentioned earlier, all files not covered in the aforementioned list above have been (nearly) unchanged from the original ExpeL repository, which can be found [here](https://github.com/LeapLabTHU/ExpeL/tree/main).
+
+## **Contact: ☎️**
+If you have any questions about any of the COA files or folders featured here, please reach out to Luke Nam at [lukelike1001@gmail.com](mailto:lukelike1001@gmail.com)
diff --git a/agent/base.py b/agent/base.py
@@ -53,6 +53,7 @@ def log_history(self, include_task: bool = True, include_all: bool = False) -> s
             match = re.search(pattern, all_history)
         if include_task:
             return match.group().lstrip("Now it's your turn!\n") + match.string[match.end():]
+        if match == None: return ""
         return match.string[match.end():].strip()
 
     def remove_task_suffix(self, task: str) -> str:

diff --git a/agent/expel.py b/agent/expel.py
@@ -365,6 +365,7 @@ def extend_rules(rule_items: List[str], success_history: str = None, fail_histor
             all_logs += '\n\n################ SUCCESS CRITIQUES ################\n'
         else:
             all_logs = loaded_log
+        all_success = []
         if loaded_dict is None or loaded_dict['critique_summary_section'] == 'compare':
             for training_id in training_ids:
                 all_success = []
@@ -423,7 +424,7 @@ def setup_vectorstore(self) -> None:
         combined_history = dict(self.succeeded_trial_history)
         if isinstance(self.all_fewshots, list):
             for fewshot in self.all_fewshots:
-                if self.benchmark_name in ['hotpotqa', 'fever']:
+                if self.benchmark_name in ['coa', 'hotpotqa', 'fever']:
                     task = fewshot.split('\n')[0]
                     trajectory = '\n'.join(fewshot.split('\n')[1:])
                 elif self.benchmark_name == 'webshop':

diff --git a/agent/react.py b/agent/react.py
@@ -53,7 +53,6 @@ def __init__(self,
         self.llm_parser = llm_parser
         self.observation_formatter = observation_formatter
         self._last_observation_history = None
-
         self.env = env(**self.tasks[self.task_idx]['env_kwargs'], max_steps=self.max_steps)
         self.env.reset()
         self.task = self.tasks[self.task_idx]['task']

diff --git a/configs/agent/expel.yaml b/configs/agent/expel.yaml
@@ -1,5 +1,5 @@
 name: expel
-llm: gpt-3.5-turbo # gpt-3.5-turbo-0301
+llm: gpt-4o-mini
 max_reflection_depth: 3
 max_num_rules: 20
 truncate_strategy: null