Skip to content

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI#73

Open
whats2000 wants to merge 43 commits intoSamuelSchmidgall:mainfrom
whats2000:main
Open

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI#73
whats2000 wants to merge 43 commits intoSamuelSchmidgall:mainfrom
whats2000:main

Conversation

@whats2000
Copy link

@whats2000 whats2000 commented Feb 8, 2025

Summary

  • Added support for Ollama as a new agent provider.
  • Integrated Gemini as a new agent provider.
  • Integrated Claude as a new agent provider.
  • Implemented a Gradio-based UI for easier configuration of agent settings.
  • Fix the bug of the config of different models then OpenAI will fail to load from checkpoint and fallback to o1-mini
  • Fix a miss-spelling of phase agent_models "report refinement" to "paper refinement", I think this is from the original code
  • Centralize the other config to config.py

What I have checked

  • Full running in Linux for the full function with gemini-2.0-flash
  • Test with the Gemini Provider
  • Test with Ollama Provider

Reference Issue able to solve

UI Example

Gradio

Launch with config_gradio.py
image

React Flask App (Beta)

Launch with app.py
image

Note

Working on adding some monitors and some dialog visualization.

Test Production Paper

SAMAug with MRANet.pdf
Review.txt

@whats2000
Copy link
Author

The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration.

@AlexTzk
Copy link

AlexTzk commented Feb 9, 2025

hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project.

Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything?

Configured the API http://OLLAMA_IP:11434 also tried http://OLLAMA_IP:11434/api/generate and http://OLLAMA_IP:11434/v1 but same result
Then I tried to modify your function and tailor more for ollama:


class OpenaiProvider:
    @staticmethod
    def get_response(
        api_key: str,
        model_name: str,
        user_prompt: str,
        system_prompt: str,
        temperature: float = None,
        base_url: str | None = None,
    ) -> str:
        if api_key == "ollama":
            url = "http://10.0.0.99:11434/api/generate"
            headers = {"Content-Type": "application/json"}
            payload = {
                "model": model_name,
                "prompt": user_prompt,
                "stream": False
            }
            response = requests.post(url, json=payload, headers=headers)
            if response.status_code == 200:
                return response.json().get("response", "No response received.")
            else:
                return f"Error: {response.status_code} - {response.text}"


        openai.api_key = api_key ```


I'm launching with ``` python config_gradio.py ``` 

Do you mind sharing your working setup details? 

@whats2000
Copy link
Author

whats2000 commented Feb 9, 2025

@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the provider.py only? Also, ensure that your Ollama version is the latest (it might be due to outdated reasons).

# Test script for Ollama
print(OpenaiProvider.get_response(
    api_key="ollama",
    model_name="deepseek-r1:32b",
    user_prompt="What is the meaning of life?",
    system_prompt="You are a philosopher seeking the meaning of life.",
    base_url="http://localhost:11434/v1/"
))

And I get a response with Ollama 0.5.7

C:\Users\user\.conda\envs\NatureLanguageAnalyze\python.exe C:\Users\user\Documents\GitHub\AgentLaboratory\provider.py 
<think>
Okay, so I just came across this user who is playing the role of a philosopher searching for the meaning of life. They asked me, "What is the meaning of life?" and my initial response was kind of an exploration of various philosophical perspectives—existentialism, humanism, stoicism, spirituality, and nihilism. Now, they’re asking me to think through this as if I'm just starting out on this journey, maybe a bit confused or overwhelmed.

// A lot of output I just skip it

Process finished with exit code 0

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 Thank you for your reply.

Your test function does work:

<think>
Okay, so I'm trying to figure out what the meaning of life is. Hmm, where do I even start with this? I've heard people talk about it in different contexts—philosophy, religion, science—and everyone seems to have a different take. Maybe I should break it down into smaller parts. 
[.......]
</think>
[more output]

Then I set baseurl within config.py:

OLLAMA_API_BASE_URL = "http://10.0.0.99:11434/v1/"

Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422
image

I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe.

The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9

@whats2000
Copy link
Author

whats2000 commented Feb 10, 2025

Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at config_gradio.py

@whats2000
Copy link
Author

@AlexTzk I added the version of the gradio, could you try to uninstall gradio and reinstall requirements.txt. I also fix the missing dependence of torchvision and torchaudio

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help.

A couple of notes:
With gradio 4.44.1 MarkupSafe version cannot be used as it requires a lower version and Pillow also requires downgrading. I left pip decide what version to use for MarkupSafe and downgraded pillow to 10.4.0. Seems to be working fine.

I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now.

@whats2000
Copy link
Author

I just updated the layout to look more balanced, do you think it looks better?
I also let it directly output the command it generates at status in order for debugging!
Hope this can help you @AlexTzk

image

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that.

Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.
@whats2000
Copy link
Author

whats2000 commented Feb 10, 2025

@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the deep seek-r1 is not that good at creating a command format prompt, making it fail to invoke the task. And that may result in max_tries exception. From the model technical report is shown that the model is not mainly trained for tool usage ( Which affects the struct output command performance). I also test the model qwen2.5:32b and it seems like it will create struct output. I will test the qwen2.5-coder:32b to see if the performance is better.

@AlexTzk
Copy link

AlexTzk commented Feb 11, 2025

@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review.

I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error.

@nullnuller
Copy link

Is it going to be merged soon?

@whats2000 whats2000 changed the title [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Gradio UI configuration [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI Feb 22, 2025
@AlexTzk
Copy link

AlexTzk commented Feb 24, 2025

@whats2000 awesome work, I will test the webui sometime this week and provide feedback.

@MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research.

I was thinking that implementing a memory_class might be a worthwhile endeavour...
Or, specifically to literature_review, rather than doing everything in one go we could split literature_review in subtasks:

  • find all relevant papers, store IDs in a file
  • go through each paper with an LLM restart after each one is reviewed so the context window doesn't run out - remove IDs after review
  • Store relevant content from the papers into a different file that gets appended after each review
  • Compile all literature_review and go to next step

a few corrections
@whats2000
Copy link
Author

Great to hear that!

whats2000 and others added 2 commits February 26, 2025 15:27
Note: You need to manually trigger at first time as the old UI do not have the update button
@whats2000
Copy link
Author

whats2000 commented Mar 1, 2025

I added a Check for Update button to the Web UI. However, for older versions, updates must be triggered manually via /api/updateWebUI. The update process is not automatic—users need to click the update button in the Web UI (Show at the figure).

Note

I’m still exploring ways to visualize the update progress. Suggestions are welcome! Also, I'm reviewing #39 #38 —what are your thoughts?

image

@nullnuller
Copy link

Great job @whats2000
How does the dataset search and download modules work?
Is it possible to enable dataset search and download from kaggle and other sources? Also, in your latest version I can't find the gui updates shown above.

@whats2000
Copy link
Author

@nullnuller Try delete the WebUI module AgentLaboratoryWebUI
and reinstall it by running app.py, because old version did not support version check.

Current Dataset workflow is using HuggingFace API, but I think need several modification in workflow agent to support custom dataset.

@nullnuller
Copy link

@nullnuller Try delete the WebUI module AgentLaboratoryWebUI and reinstall it by running app.py, because old version did not support version check.

Current Dataset workflow is using HuggingFace API, but I think need several modification in workflow agent to support custom dataset.

@whats2000 After following installation steps in https://github.com/whats2000/AgentLaboratory
I run python app.py and get the following error

The WebUI repository is not cloned.
Would you like to clone it now from https://github.com/whats2000/AgentLaboratoryWebUI.git? (y/n) y
Cloning the WebUI repository...
Cloning into 'AgentLaboratoryWebUI'...
remote: Enumerating objects: 406, done.
remote: Counting objects: 100% (406/406), done.
remote: Compressing objects: 100% (275/275), done.
remote: Total 406 (delta 156), reused 336 (delta 92), pack-reused 0 (from 0)
Receiving objects: 100% (406/406), 1.16 MiB | 3.03 MiB/s, done.
Resolving deltas: 100% (156/156), done.
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.

@whats2000
Copy link
Author

whats2000 commented Mar 13, 2025

@nullnuller I have patched that with additional try handler

Update in 2025/3/12

  • Add support for the Claude 3.7, o3-mini
  • Update the pricing for most of the models as their price drops!
  • Fix the subprocess error for custom webui dependence checker

@nullnuller
Copy link

@nullnuller I have patched that with additional try handler

Update in 2025/3/12

  • Add support for the Claude 3.7, o3-mini
  • Update the pricing for most of the models as their price drops!
  • Fix the subprocess error for custom webui dependence checker

Just checked it and it seems to be still there.

(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ python3 app.py
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ ls
AgentLaboratoryWebUI  ai_lab_repo.py  common_imports.py  config.py     LICENSE  mlesolver.py    provider.py  readme     requirements.txt  settings_manager.py  tools.py  venv_agent_lab
agents.py             app.py          config_gradio.py   inference.py  media    papersolver.py  __pycache__  README.md  settings          state_saves          utils.py
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ rm -rf AgentLaboratoryWebUI/
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ python app.py
The WebUI repository is not cloned.
Would you like to clone it now from https://github.com/whats2000/AgentLaboratoryWebUI.git? (y/n) y
Cloning the WebUI repository...
Cloning into 'AgentLaboratoryWebUI'...
remote: Enumerating objects: 413, done.
remote: Counting objects: 100% (413/413), done.
remote: Compressing objects: 100% (278/278), done.
remote: Total 413 (delta 160), reused 343 (delta 96), pack-reused 0 (from 0)
Receiving objects: 100% (413/413), 1.16 MiB | 3.04 MiB/s, done.
Resolving deltas: 100% (160/160), done.
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.

@whats2000
Copy link
Author

@nullnuller Try again after fetching the patch, does it work? It should tell you that you need to install the missing dependency.

@nullnuller
Copy link

nullnuller commented Mar 14, 2025

@nullnuller Try again after fetching the patch, does it work? It should tell you that you need to install the missing dependency.

Thanks it's now working. How to add other dataset sources ?

@whats2000
Copy link
Author

Need modify the MLESolver, that might need a lot of work

@nullnuller
Copy link

@whats2000 I have been getting this unhandled error during Literature Review causing the code to break, rather than skip over a 404 response.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ literature review ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```FULL_TEXT
0012002v1
An error occurred: Page request resulted in HTTP 400 (https://export.arxiv.org/api/query?search_query=&id_list=0012002v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100)
Press enter to exit.
```

@nullnuller
Copy link

@whats2000 I have been getting this unhandled error during Literature Review causing the code to break, rather than skip over a 404 response.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ literature review ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```FULL_TEXT
0012002v1
An error occurred: Page request resulted in HTTP 400 (https://export.arxiv.org/api/query?search_query=&id_list=0012002v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100)
Press enter to exit.

I think this was raised and solved in this issue
But there is no PR yet. Perhaps @whats2000 include it in yours?

@whats2000
Copy link
Author

I will check it out

@whats2000
Copy link
Author

@nullnuller
I have verified the patch, and now the ArXiv will have an error try ... catch handler.

Copilot AI review requested due to automatic review settings January 22, 2026 13:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive multi-provider LLM support (Ollama, Gemini, Claude) and introduces both Gradio and Flask-based Web UIs for easier configuration. It also includes bug fixes, improved error handling, and centralized configuration management.

Changes:

  • Adds support for Ollama, Gemini (Google), and Anthropic Claude as LLM providers
  • Implements two Web UI options: Gradio-based and Flask/React-based interfaces
  • Introduces centralized configuration system with settings persistence
  • Improves error handling for ArXiv API operations with comprehensive try-catch blocks

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 25 comments.

Show a summary per file
File Description
config.py New centralized configuration file with task notes, human-in-loop settings, and API base URLs
provider.py New provider abstraction layer for OpenAI and Anthropic APIs
inference.py Extensive refactoring to support multiple LLM providers with cost estimation
ai_lab_repo.py Main workflow updates for multi-provider support and configuration loading
settings_manager.py New settings persistence layer for saving/loading user configurations
config_gradio.py Gradio-based web interface for configuration
app.py Flask-based web application with React frontend support
utils.py Added task note templating and validation utilities
tools.py Enhanced ArXiv paper retrieval with comprehensive error handling
mlesolver.py Grammar corrections in documentation strings
requirements.txt Added Flask, Flask-CORS, Gradio, torchaudio, and torchvision dependencies
README.md Comprehensive documentation updates for new features and model support
common_imports.py Added torchvision and torchaudio imports
agents.py Added json import
.gitignore Added entries for settings, WebUI, and data directories

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +221 to +255
elif (model_str.startswith("claude-4.5-opus") or
model_str.startswith("claude-4.5-sonnet") or
model_str.startswith("claude-4.5-haiku") or
model_str.startswith("claude-4.1-opus") or
model_str.startswith("claude-4-opus") or
model_str.startswith("claude-4-sonnet") or
model_str.startswith("claude-3-5-sonnet") or
model_str.startswith("claude-3-5-haiku") or
model_str.startswith("claude-3-7-sonnet")
):
answer = AnthropicProvider.get_response(
api_key=os.environ["ANTHROPIC_API_KEY"],
model_name=model_str,
user_prompt=prompt,
system_prompt=system_prompt,
temperature=temp,
)
if model_str.startswith("claude-4.5-opus"):
model_str = "claude-4.5-opus"
elif model_str.startswith("claude-4.5-sonnet"):
model_str = "claude-4.5-sonnet"
elif model_str.startswith("claude-4.5-haiku"):
model_str = "claude-4.5-haiku"
elif model_str.startswith("claude-4.1-opus"):
model_str = "claude-4.1-opus"
elif model_str.startswith("claude-4-opus"):
model_str = "claude-4-opus"
elif model_str.startswith("claude-4-sonnet"):
model_str = "claude-4-sonnet"
elif model_str.startswith("claude-3-5-sonnet"):
model_str = "claude-3-5-sonnet"
elif model_str.startswith("claude-3-5-haiku"):
model_str = "claude-3-5-haiku"
elif model_str.startswith("claude-3-7-sonnet"):
model_str = "claude-3-7-sonnet"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name mapping uses startswith matching which could lead to incorrect model selection. For example, if a user specifies "claude-3-5-sonnet-20241022", the code will match it to the first condition and normalize it to just "claude-4.5-opus" if that check comes first, which is incorrect.

The order of checks should be from most specific to least specific, and ideally use exact matching or proper versioning logic rather than prefix matching to avoid misclassification.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +64
if api_key == "ollama":
ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048))
if version == "0.28":
if temperature is None:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
max_tokens=ollama_max_tokens,
)
else:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
temperature=temperature,
max_tokens=ollama_max_tokens,
)
else:
client = OpenAI(**client_config)
if temperature is None:
completion = client.chat.completions.create(
model=model_name,
messages=messages,
max_tokens=ollama_max_tokens,
)
else:
completion = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=temperature,
max_tokens=ollama_max_tokens,
)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAI provider checks api_key == "ollama" as a special case to determine if Ollama is being used. This is a fragile pattern that couples the provider implementation to a specific string value. If the API key is accidentally set to "ollama" when not using Ollama, or if the string is mistyped, it will cause unexpected behavior.

Consider using a separate boolean flag or configuration parameter to indicate the provider type, rather than overloading the API key field with special string values.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +28
def save_settings(self, settings: dict):
"""Save settings to JSON file"""
try:
# Filter out empty API keys before saving
filtered_settings = {
k: v for k, v in settings.items()
if not (k.endswith('_api_key') and not v)
}

with open(self.settings_file, 'w', encoding='utf-8') as f:
json.dump(filtered_settings, f, indent=2)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API keys are being stored in plain text in a JSON file without any encryption. The settings file at settings/user_settings.json will contain sensitive API keys that could be accidentally committed to version control or exposed.

Consider implementing encryption for stored API keys, or at minimum, add prominent warnings in the documentation about the security implications and ensure the settings directory is properly added to .gitignore (which it is, but users may not realize the security risk).

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +78
version = openai.__version__

if api_key == "ollama":
ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048))
if version == "0.28":
if temperature is None:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
max_tokens=ollama_max_tokens,
)
else:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
temperature=temperature,
max_tokens=ollama_max_tokens,
)
else:
client = OpenAI(**client_config)
if temperature is None:
completion = client.chat.completions.create(
model=model_name,
messages=messages,
max_tokens=ollama_max_tokens,
)
else:
completion = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=temperature,
max_tokens=ollama_max_tokens,
)
else:
if version == "0.28":
if temperature is None:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
)
else:
completion = openai.ChatCompletion.create(
model=model_name,
messages=messages,
temperature=temperature,
)
else:
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code converts OpenAI API version 0.28 to string comparison but the version check logic is inconsistent. In some places it checks version == "0.28" while in others it just checks the version variable directly. Since openai.version returns a string, the comparison should work, but the pattern is inconsistent throughout the codebase.

Consider standardizing on one approach and documenting which OpenAI SDK versions are supported.

Copilot uses AI. Check for mistakes.
"You have access to a code editing tool. \n"
"This tool allows you to replace lines indexed n through m (n:m) of the current code with as many lines of new code as you want to add. This removal is inclusive meaning that line n and m and everything between n and m is removed. This will be the primary way that you interact with code. \n"
"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"
"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the spelling of "inbetween" which should be two words: "in between". This appears in the code repair docstring.

Correct spelling: "everything in between will also be removed"

Suggested change
"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"
"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything in between will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"

Copilot uses AI. Check for mistakes.
Comment on lines 79 to +86
elif type(agent_model_backbone) == dict:
# todo: check if valid
self.phase_models = agent_model_backbone

# Load models for each phase if key exists otherwise use the default model
for phase, subtasks in self.phases:
for subtask in subtasks:
if subtask in agent_model_backbone:
self.phase_models[subtask] = agent_model_backbone[subtask]
else:
self.phase_models[subtask] = self.model_backbone
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phase_models dictionary initialization logic at line 79-86 has changed behavior. Previously it would use agent_model_backbone for all phases when it's a dict, but now it checks if each subtask exists in the dict and falls back to self.model_backbone. However, there's a logical issue: this code is only executed when type(agent_model_backbone) == dict, but at this point agent_model_backbone might have been reassigned from None to a dict in lines 43-52. The type check won't properly handle the case where agent_model_backbone was originally None.

The logic flow needs to be clarified to ensure the correct behavior for all initialization paths.

Copilot uses AI. Check for mistakes.
"literature review", "plan formulation",
"data preparation", "running experiments",
"results interpretation", "report writing",
"report refinement"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description claims to fix the misspelling "report refinement" to "paper refinement", but the code actually keeps "report refinement" everywhere. Looking at line 143 in utils.py, line 102 in config.py, and throughout ai_lab_repo.py, the phase is consistently called "report refinement", not "paper refinement".

Additionally, the original code referenced in the PR description (line 708 of the original repository) would need to be verified, but within this PR, the term "report refinement" is used consistently, which appears to be correct since this phase is about refining the report, not a generic paper. The PR description's claim about fixing this misspelling appears to be inaccurate.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +206
try:
if sys.platform == 'win32':
subprocess.Popen(['start', 'cmd', '/k'] + cmd, shell=True)
elif sys.platform == 'darwin':
subprocess.Popen(['open', '-a', 'Terminal'] + cmd)
else:
subprocess.Popen(['x-terminal-emulator', '-e'] + cmd)
markdown_status += "\n**Research process started in a new terminal window.**"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess commands for opening terminal windows use platform detection but don't handle all cases properly. On Linux, x-terminal-emulator may not be available on all distributions, and the command construction doesn't properly escape arguments which could lead to command injection if user input contains shell metacharacters.

Consider using more portable approaches or adding proper argument escaping with shlex.quote() to prevent potential command injection vulnerabilities.

Copilot uses AI. Check for mistakes.
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset, random_split
import torchvision
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'torchvision' is not used.

Suggested change
import torchvision

Copilot uses AI. Check for mistakes.
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset, random_split
import torchvision
import torchaudio
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'torchaudio' is not used.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants