[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI by whats2000 · Pull Request #73 · SamuelSchmidgall/AgentLaboratory

whats2000 · 2025-02-08T11:37:38Z

Summary

Added support for Ollama as a new agent provider.
Integrated Gemini as a new agent provider.
Integrated Claude as a new agent provider.
Implemented a Gradio-based UI for easier configuration of agent settings.
Fix the bug of the config of different models then OpenAI will fail to load from checkpoint and fallback to o1-mini
Fix a miss-spelling of phase agent_models "report refinement" to "paper refinement", I think this is from the original code
Centralize the other config to config.py

What I have checked

Full running in Linux for the full function with gemini-2.0-flash
Test with the Gemini Provider
Test with Ollama Provider

Reference Issue able to solve

Inference.py - Mismatch in Model String #20 -> Fix the string
Request: Easily Accessible WebUI or Google Colab Notebook #83 -> Add the WebUI for interaction
Add support for Gemini API? #33 , Add Gemini API to enable Gemini model support for the LLM backend #64 , [Work in Progress] Add support for gemini models #67 -> Direct Support Gemini
Add support for custom OpenAI API base_url #40 -> By editing the config.py > OLLAMA_API_BASE_URL and setting the OpenAI API key to ollama can connect to any service capable of OpenAI SDK
Inference.py - Mismatch in Model String #20 , Fix: Standardize Model Names for Consistency in Cost Calculation and API Calls #29 -> Fix the string
Add instructions for Windows users to install MikTeX #62, chore: update mlesolver.py #3 , Update mlesolver.py #77 -> Merge the pull request
Issue with arXiv API Request: arxiv.HTTPError: Page request resulted in HTTP 400 #50 , [Bug/Error] SSL Handshake Failure During arXiv Paper Retrieval #86 -> Add the ArXiv API error handling with try ... catch

UI Example

Gradio

Launch with config_gradio.py

React Flask App (Beta)

Launch with app.py

Note

Working on adding some monitors and some dialog visualization.

Test Production Paper

SAMAug with MRANet.pdf
Review.txt

hyperparamter -> hyperparameter

Add instructions for Windows users to install MikTeX

…tch-1 Update README.md

whats2000 · 2025-02-08T11:39:41Z

The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration.

AlexTzk · 2025-02-09T19:10:57Z

hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project.

Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything?

Configured the API http://OLLAMA_IP:11434 also tried http://OLLAMA_IP:11434/api/generate and http://OLLAMA_IP:11434/v1 but same result
Then I tried to modify your function and tailor more for ollama:


class OpenaiProvider:
    @staticmethod
    def get_response(
        api_key: str,
        model_name: str,
        user_prompt: str,
        system_prompt: str,
        temperature: float = None,
        base_url: str | None = None,
    ) -> str:
        if api_key == "ollama":
            url = "http://10.0.0.99:11434/api/generate"
            headers = {"Content-Type": "application/json"}
            payload = {
                "model": model_name,
                "prompt": user_prompt,
                "stream": False
            }
            response = requests.post(url, json=payload, headers=headers)
            if response.status_code == 200:
                return response.json().get("response", "No response received.")
            else:
                return f"Error: {response.status_code} - {response.text}"


        openai.api_key = api_key ```


I'm launching with ``` python config_gradio.py ``` 

Do you mind sharing your working setup details?

whats2000 · 2025-02-09T20:09:36Z

@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the provider.py only? Also, ensure that your Ollama version is the latest (it might be due to outdated reasons).

# Test script for Ollama
print(OpenaiProvider.get_response(
    api_key="ollama",
    model_name="deepseek-r1:32b",
    user_prompt="What is the meaning of life?",
    system_prompt="You are a philosopher seeking the meaning of life.",
    base_url="http://localhost:11434/v1/"
))

And I get a response with Ollama 0.5.7

C:\Users\user\.conda\envs\NatureLanguageAnalyze\python.exe C:\Users\user\Documents\GitHub\AgentLaboratory\provider.py 
<think>
Okay, so I just came across this user who is playing the role of a philosopher searching for the meaning of life. They asked me, "What is the meaning of life?" and my initial response was kind of an exploration of various philosophical perspectives—existentialism, humanism, stoicism, spirituality, and nihilism. Now, they’re asking me to think through this as if I'm just starting out on this journey, maybe a bit confused or overwhelmed.

// A lot of output I just skip it

Process finished with exit code 0

AlexTzk · 2025-02-10T00:11:13Z

@whats2000 Thank you for your reply.

Your test function does work:

<think>
Okay, so I'm trying to figure out what the meaning of life is. Hmm, where do I even start with this? I've heard people talk about it in different contexts—philosophy, religion, science—and everyone seems to have a different take. Maybe I should break it down into smaller parts. 
[.......]
</think>
[more output]

Then I set baseurl within config.py:

OLLAMA_API_BASE_URL = "http://10.0.0.99:11434/v1/"

Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422

I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe.

The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9

whats2000 · 2025-02-10T05:03:51Z

Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at config_gradio.py

whats2000 · 2025-02-10T06:14:33Z

@AlexTzk I added the version of the gradio, could you try to uninstall gradio and reinstall requirements.txt. I also fix the missing dependence of torchvision and torchaudio

AlexTzk · 2025-02-10T15:21:14Z

@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help.

A couple of notes:
With gradio 4.44.1 MarkupSafe version cannot be used as it requires a lower version and Pillow also requires downgrading. I left pip decide what version to use for MarkupSafe and downgraded pillow to 10.4.0. Seems to be working fine.

I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now.

whats2000 · 2025-02-10T15:39:05Z

I just updated the layout to look more balanced, do you think it looks better?
I also let it directly output the command it generates at status in order for debugging!
Hope this can help you @AlexTzk

AlexTzk · 2025-02-10T15:45:57Z

@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that.

Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.

whats2000 · 2025-02-10T17:35:32Z

@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the deep seek-r1 is not that good at creating a command format prompt, making it fail to invoke the task. And that may result in max_tries exception. From the model technical report is shown that the model is not mainly trained for tool usage ( Which affects the struct output command performance). I also test the model qwen2.5:32b and it seems like it will create struct output. I will test the qwen2.5-coder:32b to see if the performance is better.

Now only throw if all of the key is missing

AlexTzk · 2025-02-11T00:05:19Z

@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review.

I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error.

nullnuller · 2025-02-11T04:44:26Z

Is it going to be merged soon?

AlexTzk · 2025-02-24T13:53:26Z

@whats2000 awesome work, I will test the webui sometime this week and provide feedback.

@MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research.

I was thinking that implementing a memory_class might be a worthwhile endeavour...
Or, specifically to literature_review, rather than doing everything in one go we could split literature_review in subtasks:

find all relevant papers, store IDs in a file
go through each paper with an LLM restart after each one is reviewed so the context window doesn't run out - remove IDs after review
Store relevant content from the papers into a different file that gets appended after each review
Compile all literature_review and go to next step

a few corrections

whats2000 · 2025-02-24T15:18:40Z

Great to hear that!

Note: You need to manually trigger at first time as the old UI do not have the update button

whats2000 · 2025-03-01T19:43:04Z

I added a Check for Update button to the Web UI. However, for older versions, updates must be triggered manually via /api/updateWebUI. The update process is not automatic—users need to click the update button in the Web UI (Show at the figure).

Note

I’m still exploring ways to visualize the update progress. Suggestions are welcome! Also, I'm reviewing #39 #38 —what are your thoughts?

nullnuller · 2025-03-11T03:49:37Z

Great job @whats2000
How does the dataset search and download modules work?
Is it possible to enable dataset search and download from kaggle and other sources? Also, in your latest version I can't find the gui updates shown above.

whats2000 · 2025-03-11T04:22:00Z

@nullnuller Try delete the WebUI module AgentLaboratoryWebUI
and reinstall it by running app.py, because old version did not support version check.

Current Dataset workflow is using HuggingFace API, but I think need several modification in workflow agent to support custom dataset.

nullnuller · 2025-03-13T05:10:12Z

@nullnuller Try delete the WebUI module AgentLaboratoryWebUI and reinstall it by running app.py, because old version did not support version check.

Current Dataset workflow is using HuggingFace API, but I think need several modification in workflow agent to support custom dataset.

@whats2000 After following installation steps in https://github.com/whats2000/AgentLaboratory
I run python app.py and get the following error

The WebUI repository is not cloned.
Would you like to clone it now from https://github.com/whats2000/AgentLaboratoryWebUI.git? (y/n) y
Cloning the WebUI repository...
Cloning into 'AgentLaboratoryWebUI'...
remote: Enumerating objects: 406, done.
remote: Counting objects: 100% (406/406), done.
remote: Compressing objects: 100% (275/275), done.
remote: Total 406 (delta 156), reused 336 (delta 92), pack-reused 0 (from 0)
Receiving objects: 100% (406/406), 1.16 MiB | 3.03 MiB/s, done.
Resolving deltas: 100% (156/156), done.
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.

whats2000 · 2025-03-13T06:28:01Z

@nullnuller I have patched that with additional try handler

Update in 2025/3/12

Add support for the Claude 3.7, o3-mini
Update the pricing for most of the models as their price drops!
Fix the subprocess error for custom webui dependence checker

nullnuller · 2025-03-13T11:52:16Z

@nullnuller I have patched that with additional try handler

Update in 2025/3/12

Add support for the Claude 3.7, o3-mini

Update the pricing for most of the models as their price drops!

Fix the subprocess error for custom webui dependence checker

Just checked it and it seems to be still there.

(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ python3 app.py
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ ls
AgentLaboratoryWebUI  ai_lab_repo.py  common_imports.py  config.py     LICENSE  mlesolver.py    provider.py  readme     requirements.txt  settings_manager.py  tools.py  venv_agent_lab
agents.py             app.py          config_gradio.py   inference.py  media    papersolver.py  __pycache__  README.md  settings          state_saves          utils.py
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ rm -rf AgentLaboratoryWebUI/
(venv_agent_lab) (base) nulled@mail:~/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory$ python app.py
The WebUI repository is not cloned.
Would you like to clone it now from https://github.com/whats2000/AgentLaboratoryWebUI.git? (y/n) y
Cloning the WebUI repository...
Cloning into 'AgentLaboratoryWebUI'...
remote: Enumerating objects: 413, done.
remote: Counting objects: 100% (413/413), done.
remote: Compressing objects: 100% (278/278), done.
remote: Total 413 (delta 160), reused 343 (delta 96), pack-reused 0 (from 0)
Receiving objects: 100% (413/413), 1.16 MiB | 3.04 MiB/s, done.
Resolving deltas: 100% (160/160), done.
Traceback (most recent call last):
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 62, in <module>
    check_yarn_installed()
  File "/home/nulled/Downloads/LLM_Applications/AgentLaboratory-whats2000/AgentLaboratory/app.py", line 38, in check_yarn_installed
    subprocess.run(["yarn", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/home/nulled/miniconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['yarn', '--version']' returned non-zero exit status 1.

whats2000 · 2025-03-14T04:15:35Z

@nullnuller Try again after fetching the patch, does it work? It should tell you that you need to install the missing dependency.

nullnuller · 2025-03-14T13:03:42Z

@nullnuller Try again after fetching the patch, does it work? It should tell you that you need to install the missing dependency.

Thanks it's now working. How to add other dataset sources ?

whats2000 · 2025-03-14T15:42:04Z

Need modify the MLESolver, that might need a lot of work

nullnuller · 2025-03-18T05:56:14Z

@whats2000 I have been getting this unhandled error during Literature Review causing the code to break, rather than skip over a 404 response.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ literature review ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```FULL_TEXT
0012002v1

An error occurred: Page request resulted in HTTP 400 (https://export.arxiv.org/api/query?search_query=&id_list=0012002v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100)
Press enter to exit.
```

nullnuller · 2025-03-19T13:01:42Z

@whats2000 I have been getting this unhandled error during Literature Review causing the code to break, rather than skip over a 404 response.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ literature review ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```FULL_TEXT
0012002v1

An error occurred: Page request resulted in HTTP 400 (https://export.arxiv.org/api/query?search_query=&id_list=0012002v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100)
Press enter to exit.

I think this was raised and solved in this issue
But there is no PR yet. Perhaps @whats2000 include it in yours?

whats2000 · 2025-03-19T13:28:18Z

I will check it out

Which take the solution in SamuelSchmidgall#50

whats2000 · 2025-03-20T02:02:46Z

@nullnuller
I have verified the patch, and now the ArXiv will have an error try ... catch handler.

Copilot

Pull request overview

This PR adds comprehensive multi-provider LLM support (Ollama, Gemini, Claude) and introduces both Gradio and Flask-based Web UIs for easier configuration. It also includes bug fixes, improved error handling, and centralized configuration management.

Changes:

Adds support for Ollama, Gemini (Google), and Anthropic Claude as LLM providers
Implements two Web UI options: Gradio-based and Flask/React-based interfaces
Introduces centralized configuration system with settings persistence
Improves error handling for ArXiv API operations with comprehensive try-catch blocks

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
config.py	New centralized configuration file with task notes, human-in-loop settings, and API base URLs
provider.py	New provider abstraction layer for OpenAI and Anthropic APIs
inference.py	Extensive refactoring to support multiple LLM providers with cost estimation
ai_lab_repo.py	Main workflow updates for multi-provider support and configuration loading
settings_manager.py	New settings persistence layer for saving/loading user configurations
config_gradio.py	Gradio-based web interface for configuration
app.py	Flask-based web application with React frontend support
utils.py	Added task note templating and validation utilities
tools.py	Enhanced ArXiv paper retrieval with comprehensive error handling
mlesolver.py	Grammar corrections in documentation strings
requirements.txt	Added Flask, Flask-CORS, Gradio, torchaudio, and torchvision dependencies
README.md	Comprehensive documentation updates for new features and model support
common_imports.py	Added torchvision and torchaudio imports
agents.py	Added json import
.gitignore	Added entries for settings, WebUI, and data directories

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T13:34:03Z

inference.py

+            elif (model_str.startswith("claude-4.5-opus") or
+                  model_str.startswith("claude-4.5-sonnet") or
+                  model_str.startswith("claude-4.5-haiku") or
+                  model_str.startswith("claude-4.1-opus") or
+                  model_str.startswith("claude-4-opus") or
+                  model_str.startswith("claude-4-sonnet") or
+                  model_str.startswith("claude-3-5-sonnet") or
+                  model_str.startswith("claude-3-5-haiku") or
+                  model_str.startswith("claude-3-7-sonnet")
+            ):
+                answer = AnthropicProvider.get_response(
+                    api_key=os.environ["ANTHROPIC_API_KEY"],
+                    model_name=model_str,
+                    user_prompt=prompt,
+                    system_prompt=system_prompt,
+                    temperature=temp,
+                )
+                if model_str.startswith("claude-4.5-opus"):
+                    model_str = "claude-4.5-opus"
+                elif model_str.startswith("claude-4.5-sonnet"):
+                    model_str = "claude-4.5-sonnet"
+                elif model_str.startswith("claude-4.5-haiku"):
+                    model_str = "claude-4.5-haiku"
+                elif model_str.startswith("claude-4.1-opus"):
+                    model_str = "claude-4.1-opus"
+                elif model_str.startswith("claude-4-opus"):
+                    model_str = "claude-4-opus"
+                elif model_str.startswith("claude-4-sonnet"):
+                    model_str = "claude-4-sonnet"
+                elif model_str.startswith("claude-3-5-sonnet"):
+                    model_str = "claude-3-5-sonnet"
+                elif model_str.startswith("claude-3-5-haiku"):
+                    model_str = "claude-3-5-haiku"
+                elif model_str.startswith("claude-3-7-sonnet"):
+                    model_str = "claude-3-7-sonnet"


The model name mapping uses startswith matching which could lead to incorrect model selection. For example, if a user specifies "claude-3-5-sonnet-20241022", the code will match it to the first condition and normalize it to just "claude-4.5-opus" if that check comes first, which is incorrect.

The order of checks should be from most specific to least specific, and ideally use exact matching or proper versioning logic rather than prefix matching to avoid misclassification.

Copilot · 2026-01-22T13:34:03Z

provider.py

+        if api_key == "ollama":
+            ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048))
+            if version == "0.28":
+                if temperature is None:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                        max_tokens=ollama_max_tokens,
+                    )
+                else:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                        temperature=temperature,
+                        max_tokens=ollama_max_tokens,
+                    )
+            else:
+                client = OpenAI(**client_config)
+                if temperature is None:
+                    completion = client.chat.completions.create(
+                        model=model_name,
+                        messages=messages,
+                        max_tokens=ollama_max_tokens,
+                    )
+                else:
+                    completion = client.chat.completions.create(
+                        model=model_name,
+                        messages=messages,
+                        temperature=temperature,
+                        max_tokens=ollama_max_tokens,
+                    )


The OpenAI provider checks api_key == "ollama" as a special case to determine if Ollama is being used. This is a fragile pattern that couples the provider implementation to a specific string value. If the API key is accidentally set to "ollama" when not using Ollama, or if the string is mistyped, it will cause unexpected behavior.

Consider using a separate boolean flag or configuration parameter to indicate the provider type, rather than overloading the API key field with special string values.

Copilot · 2026-01-22T13:34:03Z

settings_manager.py

+    def save_settings(self, settings: dict):
+        """Save settings to JSON file"""
+        try:
+            # Filter out empty API keys before saving
+            filtered_settings = {
+                k: v for k, v in settings.items()
+                if not (k.endswith('_api_key') and not v)
+            }
+
+            with open(self.settings_file, 'w', encoding='utf-8') as f:
+                json.dump(filtered_settings, f, indent=2)


API keys are being stored in plain text in a JSON file without any encryption. The settings file at settings/user_settings.json will contain sensitive API keys that could be accidentally committed to version control or exposed.

Consider implementing encryption for stored API keys, or at minimum, add prominent warnings in the documentation about the security implications and ensure the settings directory is properly added to .gitignore (which it is, but users may not realize the security risk).

Copilot · 2026-01-22T13:34:04Z

provider.py

+        version = openai.__version__
+
+        if api_key == "ollama":
+            ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048))
+            if version == "0.28":
+                if temperature is None:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                        max_tokens=ollama_max_tokens,
+                    )
+                else:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                        temperature=temperature,
+                        max_tokens=ollama_max_tokens,
+                    )
+            else:
+                client = OpenAI(**client_config)
+                if temperature is None:
+                    completion = client.chat.completions.create(
+                        model=model_name,
+                        messages=messages,
+                        max_tokens=ollama_max_tokens,
+                    )
+                else:
+                    completion = client.chat.completions.create(
+                        model=model_name,
+                        messages=messages,
+                        temperature=temperature,
+                        max_tokens=ollama_max_tokens,
+                    )
+        else:
+            if version == "0.28":
+                if temperature is None:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                    )
+                else:
+                    completion = openai.ChatCompletion.create(
+                        model=model_name,
+                        messages=messages,
+                        temperature=temperature,
+                    )
+            else:


The code converts OpenAI API version 0.28 to string comparison but the version check logic is inconsistent. In some places it checks version == "0.28" while in others it just checks the version variable directly. Since openai.version returns a string, the comparison should work, but the pattern is inconsistent throughout the codebase.

Consider standardizing on one approach and documenting which OpenAI SDK versions are supported.

Copilot · 2026-01-22T13:34:04Z

mlesolver.py

            "You have access to a code editing tool. \n"
            "This tool allows you to replace lines indexed n through m (n:m) of the current code with as many lines of new code as you want to add. This removal is inclusive meaning that line n and m and everything between n and m is removed. This will be the primary way that you interact with code. \n"
-            "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"
+            "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"


There's a typo in the spelling of "inbetween" which should be two words: "in between". This appears in the code repair docstring.

Correct spelling: "everything in between will also be removed"

Suggested change

"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"

"You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything in between will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"

Copilot · 2026-01-22T13:34:09Z

ai_lab_repo.py

        elif type(agent_model_backbone) == dict:
-            # todo: check if valid
-            self.phase_models = agent_model_backbone
-
+            # Load models for each phase if key exists otherwise use the default model
+            for phase, subtasks in self.phases:
+                for subtask in subtasks:
+                    if subtask in agent_model_backbone:
+                        self.phase_models[subtask] = agent_model_backbone[subtask]
+                    else:
+                        self.phase_models[subtask] = self.model_backbone


The phase_models dictionary initialization logic at line 79-86 has changed behavior. Previously it would use agent_model_backbone for all phases when it's a dict, but now it checks if each subtask exists in the dict and falls back to self.model_backbone. However, there's a logical issue: this code is only executed when type(agent_model_backbone) == dict, but at this point agent_model_backbone might have been reassigned from None to a dict in lines 43-52. The type check won't properly handle the case where agent_model_backbone was originally None.

The logic flow needs to be clarified to ensure the correct behavior for all initialization paths.

Copilot · 2026-01-22T13:34:09Z

utils.py

+    "literature review", "plan formulation",
+    "data preparation", "running experiments",
+    "results interpretation", "report writing",
+    "report refinement"


The PR description claims to fix the misspelling "report refinement" to "paper refinement", but the code actually keeps "report refinement" everywhere. Looking at line 143 in utils.py, line 102 in config.py, and throughout ai_lab_repo.py, the phase is consistently called "report refinement", not "paper refinement".

Additionally, the original code referenced in the PR description (line 708 of the original repository) would need to be verified, but within this PR, the term "report refinement" is used consistently, which appears to be correct since this phase is about refining the report, not a generic paper. The PR description's claim about fixing this misspelling appears to be inaccurate.

Copilot · 2026-01-22T13:34:09Z

app.py

+    try:
+        if sys.platform == 'win32':
+            subprocess.Popen(['start', 'cmd', '/k'] + cmd, shell=True)
+        elif sys.platform == 'darwin':
+            subprocess.Popen(['open', '-a', 'Terminal'] + cmd)
+        else:
+            subprocess.Popen(['x-terminal-emulator', '-e'] + cmd)
+        markdown_status += "\n**Research process started in a new terminal window.**"


The subprocess commands for opening terminal windows use platform detection but don't handle all cases properly. On Linux, x-terminal-emulator may not be available on all distributions, and the command construction doesn't properly escape arguments which could lead to command injection if user input contains shell metacharacters.

Consider using more portable approaches or adding proper argument escaping with shlex.quote() to prevent potential command injection vulnerabilities.

Copilot · 2026-01-22T13:34:10Z

common_imports.py

 import torch.optim as optim
 import torch.nn.functional as F
 from torch.utils.data import DataLoader, Dataset, random_split
+import torchvision


Import of 'torchvision' is not used.

Suggested change

import torchvision

Copilot · 2026-01-22T13:34:10Z

common_imports.py

 import torch.nn.functional as F
 from torch.utils.data import DataLoader, Dataset, random_split
+import torchvision
+import torchaudio


Import of 'torchaudio' is not used.

eltociear and others added 6 commits January 9, 2025 23:36

chore: update mlesolver.py

b0cb490

hyperparamter -> hyperparameter

Update README.md

03f1338

Add instructions for Windows users to install MikTeX

Merge pull request SamuelSchmidgall#1 from N4SIRODDIN3/N4SIRODDIN3-pa…

3d045ce

…tch-1 Update README.md

add gradio for fast configuration

f2110be

add support for more provider

29ae2ea

Update requirements.txt

174a7da

whats2000 added 2 commits February 8, 2025 22:35

fix the check point not update

73e9bae

Add the missing json import

0862c92

Fix: Fix the check name to openai_api_key

e27295b

Moorelatrice39 approved these changes Feb 10, 2025

View reviewed changes

add torchvison,torchaudio and gradio version

705fb0f

update the layout of the web ui

07f5760

whats2000 added 3 commits February 11, 2025 01:18

add valid check for API key in web ui

be5b74c

fix the issue that model_backbone type

dab197b

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.

add a pause for user to read the review

b29b7e2

whats2000 mentioned this pull request Feb 10, 2025

Add support for Gemini API? #33

Closed

whats2000 added 2 commits February 11, 2025 01:57

fix the model estimate for Claude and Gemini

ba22078

fix the check of missing key

62057d4

Now only throw if all of the key is missing

whats2000 changed the title ~~[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Gradio UI configuration~~ [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI Feb 22, 2025

Update mlesolver.py

f48799f

a few corrections

whats2000 and others added 2 commits February 26, 2025 15:27

Merge branch 'pr/77'

5b7128f

Add: Add the fetch and update web ui

cde562a

Note: You need to manually trigger at first time as the old UI do not have the update button

whats2000 added 2 commits March 4, 2025 11:01

Add: Add a endpoint for load task note settings

5d2c0b2

Feat: Automatic append the task note file path if exist

b30d49a

whats2000 added 2 commits March 13, 2025 14:15

Add: Add support for o3-mini and 3.7 Sonnet

53dd3bc

Add: Add error handler for sub process invoke

08f9266

Fix: Fix the catch error logic to instruct to install yarn

3f68812

whats2000 mentioned this pull request Mar 14, 2025

Request: Easily Accessible WebUI or Google Colab Notebook #83

Open

Add: Add a try and catch for handle arxiv API

df7166c

Which take the solution in SamuelSchmidgall#50

Add: Add support for calculate latest model

1bd0aae

Copilot AI review requested due to automatic review settings January 22, 2026 13:21

Copilot started reviewing on behalf of whats2000 January 22, 2026 13:21 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

Conversation

whats2000 commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What I have checked

Reference Issue able to solve

UI Example

Gradio

React Flask App (Beta)

Test Production Paper

Uh oh!

whats2000 commented Feb 8, 2025

Uh oh!

AlexTzk commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whats2000 commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexTzk commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whats2000 commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whats2000 commented Feb 10, 2025

Uh oh!

AlexTzk commented Feb 10, 2025

Uh oh!

whats2000 commented Feb 10, 2025

Uh oh!

AlexTzk commented Feb 10, 2025

Uh oh!

whats2000 commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexTzk commented Feb 11, 2025

Uh oh!

nullnuller commented Feb 11, 2025

Uh oh!

AlexTzk commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whats2000 commented Feb 24, 2025

Uh oh!

whats2000 commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nullnuller commented Mar 11, 2025

Uh oh!

whats2000 commented Mar 11, 2025

Uh oh!

nullnuller commented Mar 13, 2025

Uh oh!

whats2000 commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nullnuller commented Mar 13, 2025

Uh oh!

whats2000 commented Mar 14, 2025

Uh oh!

nullnuller commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whats2000 commented Mar 14, 2025

Uh oh!

nullnuller commented Mar 18, 2025

Uh oh!

nullnuller commented Mar 19, 2025

Uh oh!

whats2000 commented Mar 19, 2025

Uh oh!

whats2000 commented Mar 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

whats2000 commented Feb 8, 2025 •

edited

Loading

AlexTzk commented Feb 9, 2025 •

edited

Loading

whats2000 commented Feb 9, 2025 •

edited

Loading

AlexTzk commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025 •

edited

Loading

AlexTzk commented Feb 24, 2025 •

edited

Loading

whats2000 commented Mar 1, 2025 •

edited

Loading

whats2000 commented Mar 13, 2025 •

edited

Loading

nullnuller commented Mar 14, 2025 •

edited

Loading