An intelligent browser automation agent powered by OpenAI that can navigate websites, interact with elements, and fill forms using natural language commands.
- π Website Navigation: Navigate to any website using natural language
- π Smart Element Detection: Automatically finds interactive elements on pages
- π±οΈ Intelligent Clicking: Click buttons, links, and elements using text or selectors
- π Form Automation: Fill forms intelligently by detecting field types
- π§ AI-Powered: Uses OpenAI to understand and execute complex browser tasks
- π― Modular Design: Clean, maintainable code structure
- Node.js (v16 or higher)
- OpenAI API key
- Clone or download the project
- Install dependencies:
npm install
- Set up environment variables:
Create a
.env
file in the root directory:OPENAI_API_KEY=your_openai_api_key_here
Interactive Mode:
npm start
Direct Command:
node index.js
// Start the application and enter commands like:
"Go to google.com and search for 'OpenAI'";
"Navigate to github.com and click on 'Sign up'";
'Fill the contact form with name: John, email: john@example.com';
The agent can handle complex multi-step tasks:
"Go to https://example-store.com, search for 'laptops',
filter by price under $1000, and add the first result to cart"
05_BrowserAutomation/
βββ src/
β βββ agent.js # AI agent configuration
β βββ utils/
β β βββ browserManager.js # Browser lifecycle management
β βββ tools/
β βββ navigationTool.js # Website navigation
β βββ scanPageTool.js # Page element detection
β βββ clickTool.js # Element interaction
β βββ fillFormTool.js # Form automation
βββ examples/
β βββ basicExample.js # Usage examples
βββ index.js # Main application entry
βββ package.json
βββ README.md
Navigates to websites using URLs or natural language descriptions.
Parameters:
url
(string): Website URL to navigate to
Example:
'Go to https://google.com';
Analyzes the current page and returns interactive elements.
Returns: List of clickable elements with their text and selectors
Clicks on elements using text content or CSS selectors.
Parameters:
elementText
(string): Text content of element to clickselector
(string, optional): CSS selector as fallback
Example:
"Click on 'Sign Up'";
'Click the submit button';
Fills form fields intelligently based on field labels, names, or placeholders.
Parameters:
fieldIdentifier
(string): Field name, placeholder, or labelvalue
(string): Value to enterselector
(string, optional): Direct CSS selector
Example:
'Fill email with john@example.com';
"Enter 'John Doe' in the name field";
The browserManager
handles browser lifecycle:
import browserManager from './src/utils/browserManager.js';
// Initialize browser
await browserManager.init();
// Navigate to website
await browserManager.navigate('https://example.com');
// Get current page
const page = browserManager.getPage();
// Close browser
await browserManager.close();
Modify browser configuration in src/utils/browserManager.js
:
const browser = await chromium.launch({
headless: false, // Set to true for headless mode
slowMo: 500, // Slow down actions (ms)
});
Customize AI behavior in src/agent.js
:
const instructions = `
You are a browser automation expert.
Add your custom instructions here...
`;
Run the included examples:
# Basic navigation and form filling
node examples/basicExample.js
Create your own examples:
import { createWebAgent } from './src/agent.js';
import { run } from '@openai/agents';
const agent = createWebAgent();
const result = await run(agent, 'Your command here');
1. Browser doesn't open:
- Check if Chromium is properly installed
- Verify system permissions
2. OpenAI API errors:
- Confirm API key is valid and has credits
- Check
.env
file configuration
3. Element not found:
- Page might still be loading - the agent will wait automatically
- Element might be in an iframe (not currently supported)
4. Form filling issues:
- Ensure fields are visible and not disabled
- Try using direct CSS selectors for complex forms
Enable verbose logging by setting environment variable:
DEBUG=true node index.js
- @openai/agents: AI agent framework
- playwright: Browser automation
- zod: Schema validation
- dotenv: Environment configuration
- prompt-sync: Interactive CLI input
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - feel free to use this project for personal or commercial purposes.
If you encounter issues:
- Check the troubleshooting section
- Review the examples for proper usage
- Ensure all dependencies are installed
- Verify your OpenAI API key is working
Made with β€οΈ for browser automation enthusiasts