Skip to content

MikeNsiah10/search_engine_project

Repository files navigation

Search Engine

A simple search engine that crawls webpages and retrieve searched word pages via a user friendly interface or frontend. The search engine consists of a web crawler,indexing system using the whoosh library and a frontend.

Project Structure

/indexdir               # whoosh search library
/static                 # stylesheet to apply styles to templates
/templates              # contains mplates to display search form and search results       
    ├── search_form.html   
    ├── search_results.html
/app.py                 # flask application
/crawler.py             # script to crawl websites for searched words
/requirements.txt       # required libraries for the project

Getting started

Prerequisites

python > 3.7 is required To use this repository, follow these steps:

  1. Clone the Repository:

       git clone https://github.com/MikeNsiah10/search_engine_project.git
    cd search_engine_project
    
  2. Setting Up a Python Environment It is recommended to use a virtual environment to manage the dependencies for this project. A virtual environment helps to isolate your project's dependencies from your global Python environment, avoiding potential conflicts.

   # Create a virtual environment in a directory named 'env'
   python3 -m venv env

    # Activate the virtual environment
    # On Windows
    env\Scripts\activate
    # On macOS/Linux
    source env/bin/activate
  1. Install Dependencies: Make sure you have the necessary libraries installed. You can use pip to install them:
        pip install -r requirements.txt
    
    

Deployment

The search engine was deployed on the university demo server. An FTP in my case Filezilla to transfer files and putty an ssh client to deploy my project to the server A wsgi file was created to connect the flask app to the server. To run on a development server , do:

python app.py and click the link

About

web crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published