Skip to content

gangiswag/AutoWikiUpdate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WiNELL

The repository contains the code and data for the paper WiNELL: Wikipedia Never-Ending Updating with LLM Agents.

Installation

Clone the repository and install the required dependencies:

git clone https://github.com/gangiswag/AutoWikiUpdate.git
pip install -r requirements.txt

Running the Wikipedia Agent

WiNELL consists of 3 modules - criteria extraction, update aggregation and wikipedia editing. Each of these modules need to be run sequentially. We also provide a script to run all the modules together run.sh. Feel free to modify the CONSTANTS in the script.

bash run.sh $PAGE_ID

The script above follows the steps described below:

Wiki Criteria Extraction

To run the criteria extraction for a paritcular wikipedia entity

cd agent
export PAGE_ID=307
export OUTPUT_DIR="outputs/$PAGE_ID"

mkdir -p "$OUTPUT_DIR/criteria"

export TIME_DELTA=14
export ENTITY_FILE="../data/entities.json"
export DATA_DIR="../data/wikipedia_data"
export CRITERIA_DIR="$OUTPUT_DIR/criteria"

python3 extract_wikipedia_criteria.py \
    --model gpt-4.1-2025-04-14 \
    --entity_file $ENTITY_FILE \
    --data_dir $DATA_DIR \
    --out_dir "$CRITERIA_DIR" \
    --page_id $PAGE_ID \
    --time_delta $TIME_DELTA \
    --time_delta_file $OUTPUT_DIR/time_delay.json \

Online Update Aggregation

After running criteria extraction now we can run the the new search agent for a the same wikipedia entity. We run the search agent for every criteria file we have extracted.

cd agent
export PAGE_ID=307
export OUTPUT_DIR="outputs/$PAGE_ID"

mkdir -p "$OUTPUT_DIR/news_updates"

export CRITERIA_DIR="$OUTPUT_DIR/criteria"
export NEWS_UPDATES_DIR="$OUTPUT_DIR/news_updates"
export NUM_ITERATION=1

for file in "$CRITERIA_DIR"/*.json; do
    # Extract the date from the filename after the underscore and before .json
    base=$(basename "$file")
    date_part=$(echo "$base" | cut -d'_' -f2 | cut -d'.' -f1)
    # Convert the date to MM-DD-YYYY format
    formatted_date=$(date -d "$date_part" +"%m-%d-%Y")
    # Read the time_delta from the JSON file using date_part as key
    td=$(python3 -c "import json; print(json.load(open('$OUTPUT_DIR/time_delay.json'))['$date_part'])")
    # search websites for news updates
    echo "Running news updates agent for file: $file with date: $formatted_date and time delta: $td"
    python3 run_news_updates.py \
        --navigator_model gpt-4.1-mini \
        --aggregator_model gpt-4.1-mini \
        --extractor_model gpt-4.1-mini \
        --num_iterations $NUM_ITERATION \
        --wiki_file "$file" \
        --out_path "$NEWS_UPDATES_DIR/updates_$base" \
        --log_path "logs/log_$base.txt" \
        --start_time "$formatted_date" \
        --time_delta "$td" \

Fine-grained Wikipedia Editing

Finally we can now run the wikipedia editor to incorporate the new information in the wikipedia article. The use_trained_model flag enables the fine-tuned editor to be used for editing wikipedia, we also support using a prompted LLM for editing - pass the corresponding model in model flag

cd agent
export WORLD_SIZE=2
export PAGE_ID=307
export OUTPUT_DIR="outputs/$PAGE_ID"

mkdir -p "$OUTPUT_DIR/news_updates"
mkdir -p "$OUTPUT_DIR/updated_content"

export CRITERIA_DIR="$OUTPUT_DIR/criteria"
export NEWS_UPDATES_DIR="$OUTPUT_DIR/news_updates"
export UPDATED_CONTENT_DIR="$OUTPUT_DIR/updated_content"


for file in "$CRITERIA_DIR"/*.json; do
    # Extract the date from the filename after the underscore and before .json
    base=$(basename "$file")
    date_part=$(echo "$base" | cut -d'_' -f2 | cut -d'.' -f1)
    # Convert the date to MM-DD-YYYY format
    formatted_date=$(date -d "$date_part" +"%m-%d-%Y")
    # Read the time_delta from the JSON file using date_part as key
    td=$(python3 -c "import json; print(json.load(open('$OUTPUT_DIR/time_delay.json'))['$date_part'])")
    # Incorporate the news updates into the sections
    echo "Integrating content for file: $file"
    python3 integrate_content_gpt.py \
        --model gpt-4.1-2025-04-14 \ # if use_trained_model not set, this model will be used
        --trained_model_path $TRAINED_MODEL_PATH \ # fine-tuned editor model
        --news_data_file "$NEWS_UPDATES_DIR/updates_$base" \
        --out_dir "$UPDATED_CONTENT_DIR" \
        --use_trained_model  # uses the fine-tuned editor

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •