Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
# Format code
format:
uv run ruff format src/ tests/

# Lint and fix issues
lint:
uv run ruff check src/ tests/ --fix
Expand All @@ -18,6 +17,7 @@ lint-check-unsafe:

# Run tests
test:
export AWE_ENV=TEST
uv run pytest tests/

# Install the package
Expand Down
89 changes: 76 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# AIWebExplorer
[![WebExplorer CI/CD](https://github.com/thinktwiceco/webexplorer/actions/workflows/ci.yml/badge.svg)](https://github.com/thinktwiceco/webexplorer/actions/workflows/ci.yml)

[![New Version Deployed](https://github.com/thinktwiceco/webexplorer/actions/workflows/version.yml/badge.svg?branch=master)](https://github.com/thinktwiceco/webexplorer/actions/workflows/version.yml)


# AIWebExplorer 🌐

An agent for agents to explore the web

## Installation
## 📦 Installation

This project uses `uv` for dependency management.

Expand All @@ -18,7 +23,7 @@ uv sync
source .venv/bin/activate
```

## Development
## 🛠️ Development

```bash
# Run linting
Expand All @@ -31,34 +36,92 @@ uv run ruff format .
uv run ruff check --select I
```

## Environment Variables
## ⚙️ Environment Variables

Copy `.env.example` to `.env` and adjust the values:

```env
# Environment setting (DEV, STAGING, PROD, CI)
# Environment setting (DEV, TEST, CI, PROD)
AWE_ENV=DEV

# Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
AWE_LOG_LEVEL=INFO

# Optional: LLM Provider configuration
# Options: openai, togetherai, deepseek
# LLM Provider configuration (REQUIRED)
# Supported providers: openai, togetherai, deepseek
AWE_LLM_PROVIDER=openai

# Optional: LLM Model to use
# Example: gpt-4, gpt-3.5-turbo, etc.
# LLM Model to use (REQUIRED)
# Examples:
# - OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# - TogetherAI: meta-llama/Llama-2-70b-chat-hf, mistralai/Mixtral-8x7B-Instruct-v0.1
# - DeepSeek: deepseek-chat, deepseek-coder
AWE_LLM_MODEL=gpt-4

# API Key for the selected provider (REQUIRED)
AWE_LLM_API_KEY=your-api-key-here
```

### Configuration Options

- **`AWE_ENV`**: Application environment (default: `DEV`)
- Options: `DEV`, `TEST`, `CI`, `PROD`
- **`AWE_LOG_LEVEL`**: Logging verbosity level (default: `INFO`)
- **`AWE_LLM_PROVIDER`**: Optional LLM provider selection. If not set, must be specified when creating agents
- **`AWE_LLM_MODEL`**: Optional default model to use. If not set, must be specified when creating agents
- Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
- **`AWE_LLM_PROVIDER`**: LLM provider selection (**REQUIRED**)
- Supported providers: `openai`, `togetherai`, `deepseek`
- Can be overridden when creating agents programmatically
- **`AWE_LLM_MODEL`**: Model identifier to use (**REQUIRED**)
- Must be compatible with the selected provider
- Can be overridden when creating agents programmatically
- **`AWE_LLM_API_KEY`**: API key for authentication (**REQUIRED**)
- Must be valid for the selected provider
- Can be overridden when creating agents programmatically

### Supported Providers

#### 🤖 OpenAI
- **Provider**: `openai`
- **Models**: `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`, and more
- **API Key**: Get from [OpenAI Platform](https://platform.openai.com/)

#### 🔗 TogetherAI
- **Provider**: `togetherai`
- **Models**: Various open-source models including Llama, Mixtral, etc.
- **API Key**: Get from [Together.ai](https://together.ai/)

#### 🔍 DeepSeek
- **Provider**: `deepseek`
- **Models**: `deepseek-chat`, `deepseek-coder`
- **API Key**: Get from [DeepSeek Platform](https://platform.deepseek.com/)

## 🧪 Testing

For comprehensive testing documentation, including how to run tests, use dependency injection for mocking, and write new tests, see the [Tests README](tests/README.md).

### Running Tests

```bash
# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest tests/test_webexplorer_integration.py
```

### 📊 Evaluation Reports

Performance evaluation reports are available in the [`tests/reports/`](tests/reports/) directory:

- [**Amazon Extraction Report**](tests/reports/amazon_extraction_report.md) - Evaluation of product information extraction from Amazon
- [**Wikipedia Extraction Report**](tests/reports/wikipedia_extraction_report.md) - Evaluation of information extraction from Wikipedia

These reports track the accuracy and performance of the WebExplorer across different types of websites and extraction tasks.

## New Features
## New Features

To develop a new feature:

Expand All @@ -79,7 +142,7 @@ To develop a new feature:
3. **Create a Pull Request to `develop` branch**
4. **After review and merge, delete the feature branch**

## New Versions
## 🚀 New Versions

### Option 1: Automated Release (Recommended)

Expand Down
5 changes: 5 additions & 0 deletions src/aiwebexplorer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,8 @@
except ImportError:
# Fallback for development
__version__ = "0.0.0+dev"

from .agents import get_evaluate_request_agent as get_evaluate_request_agent
from .agents import get_extraction_agent as get_extraction_agent
from .agents import get_finalizer_agent as get_finalizer_agent
from .webexplorer import WebExplorer as WebExplorer
63 changes: 21 additions & 42 deletions src/aiwebexplorer/agent_factory.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import os
from typing import Any, Literal, NewType, TypeVar

from agno.agent import Agent
from agno.models.deepseek import DeepSeek
from agno.models.openai import OpenAIChat
from agno.models.together import Together

from aiwebexplorer.dependencies import Dependency
from aiwebexplorer.interfaces import IAgent

from .config import config
from .config import get_provider_config

T = TypeVar("T")

Expand All @@ -27,47 +27,24 @@
SupportedModelProvider = Together | DeepSeek | OpenAIChat


def _get_api_key(provider: ModelProvider | None = None) -> tuple[str, ModelProvider]:
import dotenv

dotenv.load_dotenv()

# If a model provider is provided, only return the api key for that provider
if provider:
expected_key = f"{provider.upper()}_APIKEY"
value = os.environ.get(expected_key)
if value:
return value, provider

raise ValueError(f"Expected {expected_key} to be set when requesting provider {provider}")

# If no provider is provided, return the api key for the first provider that is set
for api_key, provider in [OPENAI_APIKEY, TOGETHERAI_APIKEY, DEEPSEEK_APIKEY]:
value = os.environ.get(api_key)

if value:
return value, provider

raise ValueError("No api key found for any provider")


def _get_model(
model_id: str | None = None,
provider: ModelProvider | None = None,
api_key: str | None = None,
model_id_map: ModelIdMap | None = None,
) -> SupportedModelProvider:
if api_key is None:
api_key, provider = _get_api_key(provider)

if model_id is None:
if not model_id_map:
error_message = """
You didn't provide a model id or a model id map. I'm expecting at least a model id map
in order to figure it out what model to use.
"""
raise ValueError(error_message)
model_id = model_id_map[provider]
provider_config = get_provider_config()
provider = provider or provider_config.AWE_LLM_PROVIDER
model_id = model_id or provider_config.AWE_LLM_MODEL
api_key = api_key or provider_config.AWE_LLM_API_KEY

if not provider:
raise RuntimeError("Specify a provider either in the configuration or as env variable AWE_LLM_PROVIDER")

if not model_id:
raise RuntimeError("Specify a model id either in the configuration or as env variable AWE_LLM_MODEL")

if not api_key:
raise RuntimeError("Specify an api key either in the configuration or as env variable AWE_LLM_API_KEY")

if provider == "togetherai":
return Together(id=model_id, api_key=api_key)
Expand All @@ -83,20 +60,22 @@ def get_agent(
name: str,
instructions: list[str],
*,
model_id: str | None = config.AWE_LLM_MODEL,
model_id: str | None = None,
api_key: str | None = None,
provider: ModelProvider | None = config.AWE_LLM_PROVIDER,
model_id_map: ModelIdMap | None = None,
provider: ModelProvider | None = None,
**kwargs: Any,
) -> IAgent[Any]:
"""Get an agent with the given name, instructions, content type, model, and other kwargs.

See Agent constructor for Agno agent details.
"""
model = _get_model(model_id, provider, api_key, model_id_map)
model = _get_model(model_id, provider, api_key)
return Agent(
name=name,
instructions=instructions,
model=model,
**kwargs,
)


get_agent_dependency = Dependency(get_agent)
Loading