Browser Automation Agent 🤖

An intelligent browser automation agent powered by OpenAI that can navigate websites, interact with elements, and fill forms using natural language commands.

✨ Features

🌐 Website Navigation: Navigate to any website using natural language
🔍 Smart Element Detection: Automatically finds interactive elements on pages
🖱️ Intelligent Clicking: Click buttons, links, and elements using text or selectors
📝 Form Automation: Fill forms intelligently by detecting field types
🧠 AI-Powered: Uses OpenAI to understand and execute complex browser tasks
🎯 Modular Design: Clean, maintainable code structure

🚀 Quick Start

Prerequisites

Node.js (v16 or higher)
OpenAI API key

Installation

Clone or download the project
Install dependencies:
```
npm install
```
Set up environment variables: Create a .env file in the root directory:
```
OPENAI_API_KEY=your_openai_api_key_here
```

Running the Application

Interactive Mode:

npm start

Direct Command:

node index.js

📖 Usage Examples

Basic Usage

// Start the application and enter commands like:
"Go to google.com and search for 'OpenAI'";
"Navigate to github.com and click on 'Sign up'";
'Fill the contact form with name: John, email: john@example.com';

Advanced Usage

The agent can handle complex multi-step tasks:

"Go to https://example-store.com, search for 'laptops',
filter by price under $1000, and add the first result to cart"

🏗️ Project Structure

05_BrowserAutomation/
├── src/
│   ├── agent.js              # AI agent configuration
│   ├── utils/
│   │   └── browserManager.js # Browser lifecycle management
│   └── tools/
│       ├── navigationTool.js # Website navigation
│       ├── scanPageTool.js   # Page element detection
│       ├── clickTool.js      # Element interaction
│       └── fillFormTool.js   # Form automation
├── examples/
│   └── basicExample.js       # Usage examples
├── index.js                  # Main application entry
├── package.json
└── README.md

🔧 API Reference

Core Tools

navigationTool

Navigates to websites using URLs or natural language descriptions.

Parameters:

url (string): Website URL to navigate to

Example:

'Go to https://google.com';

scanPageTool

Analyzes the current page and returns interactive elements.

Returns: List of clickable elements with their text and selectors

clickTool

Clicks on elements using text content or CSS selectors.

Parameters:

elementText (string): Text content of element to click
selector (string, optional): CSS selector as fallback

Example:

"Click on 'Sign Up'";
'Click the submit button';

fillFormTool

Fills form fields intelligently based on field labels, names, or placeholders.

Parameters:

fieldIdentifier (string): Field name, placeholder, or label
value (string): Value to enter
selector (string, optional): Direct CSS selector

Example:

'Fill email with john@example.com';
"Enter 'John Doe' in the name field";

Browser Manager

The browserManager handles browser lifecycle:

import browserManager from './src/utils/browserManager.js';

// Initialize browser
await browserManager.init();

// Navigate to website
await browserManager.navigate('https://example.com');

// Get current page
const page = browserManager.getPage();

// Close browser
await browserManager.close();

🛠️ Configuration

Browser Settings

Modify browser configuration in src/utils/browserManager.js:

const browser = await chromium.launch({
	headless: false, // Set to true for headless mode
	slowMo: 500, // Slow down actions (ms)
});

Agent Instructions

Customize AI behavior in src/agent.js:

const instructions = `
You are a browser automation expert.
Add your custom instructions here...
`;

🧪 Examples

Run the included examples:

# Basic navigation and form filling
node examples/basicExample.js

Create your own examples:

import { createWebAgent } from './src/agent.js';
import { run } from '@openai/agents';

const agent = createWebAgent();
const result = await run(agent, 'Your command here');

🔍 Troubleshooting

Common Issues

1. Browser doesn't open:

Check if Chromium is properly installed
Verify system permissions

2. OpenAI API errors:

Confirm API key is valid and has credits
Check .env file configuration

3. Element not found:

Page might still be loading - the agent will wait automatically
Element might be in an iframe (not currently supported)

4. Form filling issues:

Ensure fields are visible and not disabled
Try using direct CSS selectors for complex forms

Debug Mode

Enable verbose logging by setting environment variable:

DEBUG=true node index.js

📦 Dependencies

@openai/agents: AI agent framework
playwright: Browser automation
zod: Schema validation
dotenv: Environment configuration
prompt-sync: Interactive CLI input

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

MIT License - feel free to use this project for personal or commercial purposes.

🆘 Support

If you encounter issues:

Check the troubleshooting section
Review the examples for proper usage
Ensure all dependencies are installed
Verify your OpenAI API key is working

Made with ❤️ for browser automation enthusiasts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Automation Agent 🤖

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the Application

📖 Usage Examples

Basic Usage

Advanced Usage

🏗️ Project Structure

🔧 API Reference

Core Tools

navigationTool

scanPageTool

clickTool

fillFormTool

Browser Manager

🛠️ Configuration

Browser Settings

Agent Instructions

🧪 Examples

🔍 Troubleshooting

Common Issues

Debug Mode

📦 Dependencies

🤝 Contributing

📄 License

🆘 Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package.json		package.json

Kanishk2004/browser_automation_agent

Folders and files

Latest commit

History

Repository files navigation

Browser Automation Agent 🤖

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the Application

📖 Usage Examples

Basic Usage

Advanced Usage

🏗️ Project Structure

🔧 API Reference

Core Tools

navigationTool

scanPageTool

clickTool

fillFormTool

Browser Manager

🛠️ Configuration

Browser Settings

Agent Instructions

🧪 Examples

🔍 Troubleshooting

Common Issues

Debug Mode

📦 Dependencies

🤝 Contributing

📄 License

🆘 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages