Skip to content

AI-powered browser automation with Playwright and OpenAI. Navigate sites, detect elements, click, and fill forms from natural language commands in a simple, modular Node.js setup.

Notifications You must be signed in to change notification settings

Kanishk2004/browser_automation_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Browser Automation Agent πŸ€–

An intelligent browser automation agent powered by OpenAI that can navigate websites, interact with elements, and fill forms using natural language commands.

✨ Features

  • 🌐 Website Navigation: Navigate to any website using natural language
  • πŸ” Smart Element Detection: Automatically finds interactive elements on pages
  • πŸ–±οΈ Intelligent Clicking: Click buttons, links, and elements using text or selectors
  • πŸ“ Form Automation: Fill forms intelligently by detecting field types
  • 🧠 AI-Powered: Uses OpenAI to understand and execute complex browser tasks
  • 🎯 Modular Design: Clean, maintainable code structure

πŸš€ Quick Start

Prerequisites

  • Node.js (v16 or higher)
  • OpenAI API key

Installation

  1. Clone or download the project
  2. Install dependencies:
    npm install
  3. Set up environment variables: Create a .env file in the root directory:
    OPENAI_API_KEY=your_openai_api_key_here

Running the Application

Interactive Mode:

npm start

Direct Command:

node index.js

πŸ“– Usage Examples

Basic Usage

// Start the application and enter commands like:
"Go to google.com and search for 'OpenAI'";
"Navigate to github.com and click on 'Sign up'";
'Fill the contact form with name: John, email: john@example.com';

Advanced Usage

The agent can handle complex multi-step tasks:

"Go to https://example-store.com, search for 'laptops',
filter by price under $1000, and add the first result to cart"

πŸ—οΈ Project Structure

05_BrowserAutomation/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agent.js              # AI agent configuration
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── browserManager.js # Browser lifecycle management
β”‚   └── tools/
β”‚       β”œβ”€β”€ navigationTool.js # Website navigation
β”‚       β”œβ”€β”€ scanPageTool.js   # Page element detection
β”‚       β”œβ”€β”€ clickTool.js      # Element interaction
β”‚       └── fillFormTool.js   # Form automation
β”œβ”€β”€ examples/
β”‚   └── basicExample.js       # Usage examples
β”œβ”€β”€ index.js                  # Main application entry
β”œβ”€β”€ package.json
└── README.md

πŸ”§ API Reference

Core Tools

navigationTool

Navigates to websites using URLs or natural language descriptions.

Parameters:

  • url (string): Website URL to navigate to

Example:

'Go to https://google.com';

scanPageTool

Analyzes the current page and returns interactive elements.

Returns: List of clickable elements with their text and selectors

clickTool

Clicks on elements using text content or CSS selectors.

Parameters:

  • elementText (string): Text content of element to click
  • selector (string, optional): CSS selector as fallback

Example:

"Click on 'Sign Up'";
'Click the submit button';

fillFormTool

Fills form fields intelligently based on field labels, names, or placeholders.

Parameters:

  • fieldIdentifier (string): Field name, placeholder, or label
  • value (string): Value to enter
  • selector (string, optional): Direct CSS selector

Example:

'Fill email with john@example.com';
"Enter 'John Doe' in the name field";

Browser Manager

The browserManager handles browser lifecycle:

import browserManager from './src/utils/browserManager.js';

// Initialize browser
await browserManager.init();

// Navigate to website
await browserManager.navigate('https://example.com');

// Get current page
const page = browserManager.getPage();

// Close browser
await browserManager.close();

πŸ› οΈ Configuration

Browser Settings

Modify browser configuration in src/utils/browserManager.js:

const browser = await chromium.launch({
	headless: false, // Set to true for headless mode
	slowMo: 500, // Slow down actions (ms)
});

Agent Instructions

Customize AI behavior in src/agent.js:

const instructions = `
You are a browser automation expert.
Add your custom instructions here...
`;

πŸ§ͺ Examples

Run the included examples:

# Basic navigation and form filling
node examples/basicExample.js

Create your own examples:

import { createWebAgent } from './src/agent.js';
import { run } from '@openai/agents';

const agent = createWebAgent();
const result = await run(agent, 'Your command here');

πŸ” Troubleshooting

Common Issues

1. Browser doesn't open:

  • Check if Chromium is properly installed
  • Verify system permissions

2. OpenAI API errors:

  • Confirm API key is valid and has credits
  • Check .env file configuration

3. Element not found:

  • Page might still be loading - the agent will wait automatically
  • Element might be in an iframe (not currently supported)

4. Form filling issues:

  • Ensure fields are visible and not disabled
  • Try using direct CSS selectors for complex forms

Debug Mode

Enable verbose logging by setting environment variable:

DEBUG=true node index.js

πŸ“¦ Dependencies

  • @openai/agents: AI agent framework
  • playwright: Browser automation
  • zod: Schema validation
  • dotenv: Environment configuration
  • prompt-sync: Interactive CLI input

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

MIT License - feel free to use this project for personal or commercial purposes.

πŸ†˜ Support

If you encounter issues:

  1. Check the troubleshooting section
  2. Review the examples for proper usage
  3. Ensure all dependencies are installed
  4. Verify your OpenAI API key is working

Made with ❀️ for browser automation enthusiasts

About

AI-powered browser automation with Playwright and OpenAI. Navigate sites, detect elements, click, and fill forms from natural language commands in a simple, modular Node.js setup.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published