Skip to content

generomuga/port2viz

Repository files navigation

Introduction

port2viz is a Python based web scraping/data mining tool for UN/LOCODE (Code for Trade and Transport Locations). It helps you to harvest data from various web page sources, format it and generate useful information for a maritime related industry operations.

Prerequisites

Dependencies

This tool requires the following packages installed for execution and future development:

  • beautifulsoup4==4.10.0
  • certifi==2021.5.30
  • charset-normalizer==2.0.6
  • et-xmlfile==1.1.0
  • html5lib==1.1
  • idna==3.2
  • lxml==4.6.3
  • numpy==1.21.2
  • openpyxl==3.0.9
  • pandas==1.3.3
  • python-crontab==2.5.1
  • python-dateutil==2.8.2
  • pytz==2021.1
  • requests==2.26.0
  • reverse-geocoder==1.5.1
  • schedule==1.1.0
  • scipy==1.7.1
  • six==1.16.0
  • soupsieve==2.2.1
  • urllib3==1.26.7
  • webencodings==0.5.1

To install these packages easily, you can follow these steps:

  1. Open command prompt/terminal.
  2. Change directory (cd) to the directory where your requirements.txt is located.
  3. Type python -m pip install -r requirements.txt.

Configuration

  • Go to /config/conf.ini and set required urls and paths
  [PATH]
  DB_KDM = /db/kdm_dev.db
  DB_KDM_P2V = /db/kdm_port_to_viz.db
  BASE_URL = https://unece.org/trade/cefact/unlocode-code-list-country-and-territory
  BASE_URL_LOCODE = https://service.unece.org/trade/locode/
  EXTENSION = .htm
  EXPORT_PATH = /export/export_failed_mapping.xlsx
  LOG_PATH = /logs/logs.log

Folder and File Structure

  • config - contains the operation configurations
  • db - contains the raw and output databases
  • export - contains the failed mapping excel file
  • lib - contains Python classes
  • logs - contains the logging information
  • scripts - contains the sql scripts
  • controller.py - contains scheduler function
  • main.py - contains scraping operations
  • test.py - contains unit test
  • requirements.txt - contains dependencies

How to Run

  • Activate virtual env source env/bin/activate
  • Run the main scraper python controller.py

About the Developer

This tool was developed by Gene Romuga, a programmer and constant student of IT.

About

port2viz is a Python based web scraping/data mining tool for UN/LOCODE (Code for Trade and Transport Locations). It helps you to harvest data from various web page sources, format it and generate useful information for a maritime related industry operations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages