Skip to content
This repository was archived by the owner on Nov 22, 2025. It is now read-only.

helplanes/scraperAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScraperAI

A UI interface that combines web scraping with local AI models (via Ollama) to analyze web content and answer questions about it.

image


🚀 Features

  • 🌐 Web Scraping: Enter any URL to scrape website content.
  • 🤖 AI Integration: Process scraped content with local LLMs through Ollama.
  • 💬 Chat Interface: Ask follow-up questions about scraped content in a conversational format.
  • 📋 Model Selection: Choose from available Ollama models to customize the AI's behavior.
  • 💾 Local Storage: Conversations are stored in your browser for easy access.

🛠️ Prerequisites

Before setting up the project, ensure you have the following installed:

  • Node.js (v14+)
  • Python (v3.9+)
  • Ollama installed and running locally
    • Install as many models as you want using ollama pull <model-name>.

📦 Setup and Installation

Step 1: Install Ollama

  1. Install Ollama on your machine by following the instructions at ollama.ai.

  2. Pull at least one model to use with the application:

    ollama pull mistral
    # or any other model you prefer
  3. Start the Ollama server:

    ollama serve

Step 2: Set Up the Backend

  1. Navigate to the backend directory:

    cd backend
  2. Create and activate a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Start the backend server:

    uvicorn server:app --reload

    The backend server will run at http://localhost:8000.


Step 3: Set Up the Frontend

  1. Navigate to the frontend directory:

    cd frontend
  2. Install dependencies:

    npm install
  3. Start the development server:

    npm start

    The frontend will be available at http://localhost:3000.


🖥️ Usage

  1. Open the application in your browser at http://localhost:3000.
  2. Select an Ollama model from the dropdown in the sidebar.
  3. Enter a URL in the input box and click Scrape.
  4. Wait for the scraping process to complete.
  5. Ask questions about the scraped content in the chat interface.

📡 API Endpoints

Backend Endpoints

  • GET /api/models
    Fetch a list of available Ollama models.

  • POST /api/scrape
    Scrape content from a website.
    Request Body:

    {
      "url": "https://example.com"
    }
  • POST /api/ollama
    Send content to Ollama for processing.
    Request Body:

    {
      "model": "mistral",
      "content": "Scraped content here...",
      "prompt": "Summarize this content."
    }

🛠️ Stack

Frontend

The frontend is built with React and uses:

  • CSS for styling.
  • Local Storage for persisting conversations.

Backend

The backend is built with FastAPI and uses:

  • BeautifulSoup for web scraping.
  • Playwright for rendering JavaScript-heavy websites.
  • HTTPX for making HTTP requests.

⚠️ Limitations

- Ollama must be running locally for the application to work.
- Some websites may block web scraping attempts.
- Complex web pages (SPAs, JavaScript-heavy sites) may not scrape properly.
- Large web pages will be truncated to avoid overwhelming the LLM.

🌟 Future Improvements

  • Enhance scraping capabilities for JavaScript-heavy websites. [High Priority]
  • Support document uploads (e.g., PDFs, DOCs).
  • Implement vector search for better content retrieval.
  • Add multi-language support for broader usability.

🧑‍💻 Contributing

Contributions are welcome and would be greatly appreciated!
If you'd like to contribute, please fork the repository and submit a pull request.

📞 Support

If you encounter any issues or have questions, feel free to open an issue in the repository or contact the project maintainers.


About

AI-powered web scraper combining web scraping with local AI models via Ollama to analyze and interact with web content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors