[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI#73
[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI#73whats2000 wants to merge 43 commits intoSamuelSchmidgall:mainfrom
Conversation
hyperparamter -> hyperparameter
Add instructions for Windows users to install MikTeX
…tch-1 Update README.md
|
The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration. |
|
hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project. Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything? Configured the API |
|
@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the # Test script for Ollama
print(OpenaiProvider.get_response(
api_key="ollama",
model_name="deepseek-r1:32b",
user_prompt="What is the meaning of life?",
system_prompt="You are a philosopher seeking the meaning of life.",
base_url="http://localhost:11434/v1/"
))And I get a response with Ollama |
|
@whats2000 Thank you for your reply. Your test function does work: Then I set baseurl within config.py: Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422 I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe. The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9 |
|
Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at |
|
@AlexTzk I added the version of the |
|
@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help. A couple of notes: I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now. |
|
I just updated the layout to look more balanced, do you think it looks better? |
|
@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that. Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍 |
The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.
|
@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the |
Now only throw if all of the key is missing
|
@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review. I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error. |
|
Is it going to be merged soon? |
|
@whats2000 awesome work, I will test the webui sometime this week and provide feedback. @MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research. I was thinking that implementing a memory_class might be a worthwhile endeavour...
|
a few corrections
|
Great to hear that! |
Note: You need to manually trigger at first time as the old UI do not have the update button
|
I added a Check for Update button to the Web UI. However, for older versions, updates must be triggered manually via |
|
Great job @whats2000 |
|
@nullnuller Try delete the WebUI module Current Dataset workflow is using HuggingFace API, but I think need several modification in workflow agent to support custom dataset. |
@whats2000 After following installation steps in https://github.com/whats2000/AgentLaboratory |
|
@nullnuller I have patched that with additional try handler Update in 2025/3/12
|
Just checked it and it seems to be still there. |
|
@nullnuller Try again after fetching the patch, does it work? It should tell you that you need to install the missing dependency. |
Thanks it's now working. How to add other dataset sources ? |
|
Need modify the MLESolver, that might need a lot of work |
|
@whats2000 I have been getting this unhandled error during Literature Review causing the code to break, rather than skip over a 404 response. |
I think this was raised and solved in this issue |
|
I will check it out |
Which take the solution in SamuelSchmidgall#50
|
@nullnuller |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive multi-provider LLM support (Ollama, Gemini, Claude) and introduces both Gradio and Flask-based Web UIs for easier configuration. It also includes bug fixes, improved error handling, and centralized configuration management.
Changes:
- Adds support for Ollama, Gemini (Google), and Anthropic Claude as LLM providers
- Implements two Web UI options: Gradio-based and Flask/React-based interfaces
- Introduces centralized configuration system with settings persistence
- Improves error handling for ArXiv API operations with comprehensive try-catch blocks
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 25 comments.
Show a summary per file
| File | Description |
|---|---|
| config.py | New centralized configuration file with task notes, human-in-loop settings, and API base URLs |
| provider.py | New provider abstraction layer for OpenAI and Anthropic APIs |
| inference.py | Extensive refactoring to support multiple LLM providers with cost estimation |
| ai_lab_repo.py | Main workflow updates for multi-provider support and configuration loading |
| settings_manager.py | New settings persistence layer for saving/loading user configurations |
| config_gradio.py | Gradio-based web interface for configuration |
| app.py | Flask-based web application with React frontend support |
| utils.py | Added task note templating and validation utilities |
| tools.py | Enhanced ArXiv paper retrieval with comprehensive error handling |
| mlesolver.py | Grammar corrections in documentation strings |
| requirements.txt | Added Flask, Flask-CORS, Gradio, torchaudio, and torchvision dependencies |
| README.md | Comprehensive documentation updates for new features and model support |
| common_imports.py | Added torchvision and torchaudio imports |
| agents.py | Added json import |
| .gitignore | Added entries for settings, WebUI, and data directories |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif (model_str.startswith("claude-4.5-opus") or | ||
| model_str.startswith("claude-4.5-sonnet") or | ||
| model_str.startswith("claude-4.5-haiku") or | ||
| model_str.startswith("claude-4.1-opus") or | ||
| model_str.startswith("claude-4-opus") or | ||
| model_str.startswith("claude-4-sonnet") or | ||
| model_str.startswith("claude-3-5-sonnet") or | ||
| model_str.startswith("claude-3-5-haiku") or | ||
| model_str.startswith("claude-3-7-sonnet") | ||
| ): | ||
| answer = AnthropicProvider.get_response( | ||
| api_key=os.environ["ANTHROPIC_API_KEY"], | ||
| model_name=model_str, | ||
| user_prompt=prompt, | ||
| system_prompt=system_prompt, | ||
| temperature=temp, | ||
| ) | ||
| if model_str.startswith("claude-4.5-opus"): | ||
| model_str = "claude-4.5-opus" | ||
| elif model_str.startswith("claude-4.5-sonnet"): | ||
| model_str = "claude-4.5-sonnet" | ||
| elif model_str.startswith("claude-4.5-haiku"): | ||
| model_str = "claude-4.5-haiku" | ||
| elif model_str.startswith("claude-4.1-opus"): | ||
| model_str = "claude-4.1-opus" | ||
| elif model_str.startswith("claude-4-opus"): | ||
| model_str = "claude-4-opus" | ||
| elif model_str.startswith("claude-4-sonnet"): | ||
| model_str = "claude-4-sonnet" | ||
| elif model_str.startswith("claude-3-5-sonnet"): | ||
| model_str = "claude-3-5-sonnet" | ||
| elif model_str.startswith("claude-3-5-haiku"): | ||
| model_str = "claude-3-5-haiku" | ||
| elif model_str.startswith("claude-3-7-sonnet"): | ||
| model_str = "claude-3-7-sonnet" |
There was a problem hiding this comment.
The model name mapping uses startswith matching which could lead to incorrect model selection. For example, if a user specifies "claude-3-5-sonnet-20241022", the code will match it to the first condition and normalize it to just "claude-4.5-opus" if that check comes first, which is incorrect.
The order of checks should be from most specific to least specific, and ideally use exact matching or proper versioning logic rather than prefix matching to avoid misclassification.
| if api_key == "ollama": | ||
| ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048)) | ||
| if version == "0.28": | ||
| if temperature is None: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| client = OpenAI(**client_config) | ||
| if temperature is None: | ||
| completion = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| completion = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| max_tokens=ollama_max_tokens, | ||
| ) |
There was a problem hiding this comment.
The OpenAI provider checks api_key == "ollama" as a special case to determine if Ollama is being used. This is a fragile pattern that couples the provider implementation to a specific string value. If the API key is accidentally set to "ollama" when not using Ollama, or if the string is mistyped, it will cause unexpected behavior.
Consider using a separate boolean flag or configuration parameter to indicate the provider type, rather than overloading the API key field with special string values.
| def save_settings(self, settings: dict): | ||
| """Save settings to JSON file""" | ||
| try: | ||
| # Filter out empty API keys before saving | ||
| filtered_settings = { | ||
| k: v for k, v in settings.items() | ||
| if not (k.endswith('_api_key') and not v) | ||
| } | ||
|
|
||
| with open(self.settings_file, 'w', encoding='utf-8') as f: | ||
| json.dump(filtered_settings, f, indent=2) |
There was a problem hiding this comment.
API keys are being stored in plain text in a JSON file without any encryption. The settings file at settings/user_settings.json will contain sensitive API keys that could be accidentally committed to version control or exposed.
Consider implementing encryption for stored API keys, or at minimum, add prominent warnings in the documentation about the security implications and ensure the settings directory is properly added to .gitignore (which it is, but users may not realize the security risk).
| version = openai.__version__ | ||
|
|
||
| if api_key == "ollama": | ||
| ollama_max_tokens = int(os.getenv("OLLAMA_MAX_TOKENS", 2048)) | ||
| if version == "0.28": | ||
| if temperature is None: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| client = OpenAI(**client_config) | ||
| if temperature is None: | ||
| completion = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| completion = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| max_tokens=ollama_max_tokens, | ||
| ) | ||
| else: | ||
| if version == "0.28": | ||
| if temperature is None: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| ) | ||
| else: | ||
| completion = openai.ChatCompletion.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| ) | ||
| else: |
There was a problem hiding this comment.
The code converts OpenAI API version 0.28 to string comparison but the version check logic is inconsistent. In some places it checks version == "0.28" while in others it just checks the version variable directly. Since openai.version returns a string, the comparison should work, but the pattern is inconsistent throughout the codebase.
Consider standardizing on one approach and documenting which OpenAI SDK versions are supported.
| "You have access to a code editing tool. \n" | ||
| "This tool allows you to replace lines indexed n through m (n:m) of the current code with as many lines of new code as you want to add. This removal is inclusive meaning that line n and m and everything between n and m is removed. This will be the primary way that you interact with code. \n" | ||
| "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n" | ||
| "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n" |
There was a problem hiding this comment.
There's a typo in the spelling of "inbetween" which should be two words: "in between". This appears in the code repair docstring.
Correct spelling: "everything in between will also be removed"
| "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n" | |
| "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the last line index you want to replace (everything in between will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n" |
| elif type(agent_model_backbone) == dict: | ||
| # todo: check if valid | ||
| self.phase_models = agent_model_backbone | ||
|
|
||
| # Load models for each phase if key exists otherwise use the default model | ||
| for phase, subtasks in self.phases: | ||
| for subtask in subtasks: | ||
| if subtask in agent_model_backbone: | ||
| self.phase_models[subtask] = agent_model_backbone[subtask] | ||
| else: | ||
| self.phase_models[subtask] = self.model_backbone |
There was a problem hiding this comment.
The phase_models dictionary initialization logic at line 79-86 has changed behavior. Previously it would use agent_model_backbone for all phases when it's a dict, but now it checks if each subtask exists in the dict and falls back to self.model_backbone. However, there's a logical issue: this code is only executed when type(agent_model_backbone) == dict, but at this point agent_model_backbone might have been reassigned from None to a dict in lines 43-52. The type check won't properly handle the case where agent_model_backbone was originally None.
The logic flow needs to be clarified to ensure the correct behavior for all initialization paths.
| "literature review", "plan formulation", | ||
| "data preparation", "running experiments", | ||
| "results interpretation", "report writing", | ||
| "report refinement" |
There was a problem hiding this comment.
The PR description claims to fix the misspelling "report refinement" to "paper refinement", but the code actually keeps "report refinement" everywhere. Looking at line 143 in utils.py, line 102 in config.py, and throughout ai_lab_repo.py, the phase is consistently called "report refinement", not "paper refinement".
Additionally, the original code referenced in the PR description (line 708 of the original repository) would need to be verified, but within this PR, the term "report refinement" is used consistently, which appears to be correct since this phase is about refining the report, not a generic paper. The PR description's claim about fixing this misspelling appears to be inaccurate.
| try: | ||
| if sys.platform == 'win32': | ||
| subprocess.Popen(['start', 'cmd', '/k'] + cmd, shell=True) | ||
| elif sys.platform == 'darwin': | ||
| subprocess.Popen(['open', '-a', 'Terminal'] + cmd) | ||
| else: | ||
| subprocess.Popen(['x-terminal-emulator', '-e'] + cmd) | ||
| markdown_status += "\n**Research process started in a new terminal window.**" |
There was a problem hiding this comment.
The subprocess commands for opening terminal windows use platform detection but don't handle all cases properly. On Linux, x-terminal-emulator may not be available on all distributions, and the command construction doesn't properly escape arguments which could lead to command injection if user input contains shell metacharacters.
Consider using more portable approaches or adding proper argument escaping with shlex.quote() to prevent potential command injection vulnerabilities.
| import torch.optim as optim | ||
| import torch.nn.functional as F | ||
| from torch.utils.data import DataLoader, Dataset, random_split | ||
| import torchvision |
There was a problem hiding this comment.
Import of 'torchvision' is not used.
| import torchvision |
| import torch.nn.functional as F | ||
| from torch.utils.data import DataLoader, Dataset, random_split | ||
| import torchvision | ||
| import torchaudio |
There was a problem hiding this comment.
Import of 'torchaudio' is not used.



Summary
o1-miniconfig.pyWhat I have checked
gemini-2.0-flashReference Issue able to solve
config.py>OLLAMA_API_BASE_URLand setting the OpenAI API key toollamacan connect to any service capable of OpenAI SDKtry ... catchUI Example
Gradio
Launch with

config_gradio.pyReact Flask App (Beta)
Launch with

app.pyNote
Working on adding some monitors and some dialog visualization.
Test Production Paper
SAMAug with MRANet.pdf
Review.txt