Skip to content

amunz/ghdata

 
 

GHData

branch status
master Build Status
dev Build Status

GHData is a Python library and REST server that provides data related to GitHub repositories. Hosting the GHData project requires a copy of the GHTorrent database.

GHData is under heavy development; expect frequent backwards-incompatible changes until a 1.x.x release!

Roadmap

Our technical, outreach, and academic goals roadmap.

Installation with Docker (easy to get up and running)

Before we begin, make sure you have everything you need installed: Git, Docker, Docker Compose, and a MySQL server with GHTorrent loaded.

Now, to install:

  1. Clone the repo and enter its directory:

    git clone https://github.com/OSSHealth/ghdata
    cd ghdata
  2. Configure the following environment variables:

    # Most likely required
    GHDATA_DB_USER
    GHDATA_DB_PASS
    GHDATA_DB_HOST
    GHDATA_DB_PORT
    GHDATA_DB_NAME
    
    # Optional
    GHDATA_HOST
    GHDATA_PORT
    GHDATA_PUBLIC_WWW_API_KEY
    GHDATA_GITHUB_API_KEY
    GHDATA_LIBRARIESIO_API_KEY
    GHDATA_DEBUG

    docker-compose will automatically pass the relevant environment variables to the container.

  3. Build the container with docker-compose build

  4. Launch the container with docker-compose up

Installation without Docker (recommended for developers)

Dependencies

  • Python 3.4.x/Python 2.7.x and pip
  • Static web server such as nginx or Apache
  • MySQL 5.x or later with the GHTorrent database
    • You can use the MSR14 dataset for testing
    • Our Development team has a public read only database you can request access to
    • If you want to install your own copy of the MSR14 dataset Installation instructions

After restoring GHTorrent (or msr14) to MySQL, it is recommended you create a user for GHData. GHData only needs SELECT privileges.

Once the database is set up, clone GHData

git clone https://github.com/OSSHealth/ghdata/
cd ghdata && pip install -U .

Copy the files in [ghdata repo]/frontend/public to your webserver:

Run ghdata to create the configuration file (ghdata.cfg). Edit the file to reflect your database credentials.

Run ghdata to start the backend. Visit your front

Developer Installation

Dependencies

  • Python 3.4.x and Python 2.7.x with pip2 and pip3
  • MySQL 5.x or later with the GHTorrent database
  • NodeJS 7.x or newer

Ubuntu

   ## Python Installs on UBUNUTU
   sudo apt-get install python-pip
   sudo apt-get install python3-pip

   ## For Development you need NodeJS
   sudo apt-get install nodejs

First, clone the repo and checkout the dev branch:

git clone https://github.com/OSSHealth/ghdata/ && cd ghdata && git checkout dev

Install the Python and Node developer dependencies:

make install-dev

For futher instructions on how to add to GHData, here are guides to adding an endpoint to the full stack.

Dev Guide Part 1

Dev Guide Part 2

Frontend development guide coming soon!

You're good to go.

In one shell, you'll want to run ghdata, in another run cd frontend/ && brunch watch -s.

If you have GNU Screen installed. this can be done automatically using make dev-start.

The screen sessions can be killed with make dev-stop

License and Copyright

Copyright © 2017 University of Nebraska at Omaha and the University of Missouri

GHData is free software: you can redistribute it and/or modify it under the terms of the MIT License as published by the Open Source Initiative. See the file LICENSE for more details.

(This work has been funded through the Alfred P. Sloan Foundation)

About

Python library and web service for GitHub Health and Sustainability metrics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 93.8%
  • CSS 3.4%
  • Python 2.3%
  • Other 0.5%