Frontend node service for data collected from palantiri
Installing this code base requires the use of the npm command available within the nodejs install. You will need to install it locally before moving on with the installation.
$ git clone https://github.com/anidata/ht-archive.git
$ cd ht-archive
$ npm install
This guide will walk you through the process of setting up a local PostgreSQL server and importing the database backups so that you can use the ht-archive web application.
Scroll down for command line installation instructions.
Two docker containers are used: 1) a PostgreSQL server and 2) the web application running node.js that runs the web search. The database is mounted to the PostgreSQL docker container via an easily accessilble folder (referred as /SOME_FOLDER).
-
If you haven't already installed Docker, follow the directions on the Docker website: https://www.docker.com/products/overview. Using macports or brew is not suggested.
-
Download the (~400 MB compressed, ~6.5 GB extracted) PostgreSQL backup from one of the following places (destination is referred to as
/THE_FOLDER_WITH_DOWNLOADED_FILE): -
Extract the SQL file from the downloaded
crawler_er.tar.gzto/THE_FOLDER_WITH_DOWNLOADED_FILEusing the following command on the command line or an archive tool:$ tar xzf /THE_FOLDER_WITH_DOWNLOADED_FILE/crawler_er.tar.gz -
Build/run the PostgreSQL server. Ensure /SOME_FOLDER is empty and be sure to complete the next two steps before stopping the docker container. Otherwise, the docker container will need to be fixed (removing is simplest). Note that if successful, it will leave a terminal window running, and further commands will require a different/new terminal window.
$ sudo docker run -d -e POSTGRES_PASSWORD=1234 -e POSTGRES_USER=dbadmin -e POSTGRES_DB=sandbox -v /SOME_FOLDER:/var/lib/postgresql/data -p 5432:5432 --name postgres postgres- The command downloads, partially configures, and runs the docker image and container
postgres: -dsets it as a daemon that will run in the background-esets environment variables-vmounts the folder\SOME_FOLDERto the container-psets the network port--namenames the names the containerpostgrespostgresreferences the docker standard PostgreSQL image
- The command downloads, partially configures, and runs the docker image and container
-
Copy the the
crawler.sqlfile that was extracted fromcrawler_er.tar.gzto/SOME_FOLDER$ cp /THE_FOLDER_WITH_DOWNLOADED_FILE/crawler.sql /SOME_FOLDER -
Load the crawler database into the PostgreSQL server. This step will take a few minutes and will return to the command line when finished. If it fails, see below for some troubleshooting suggestions.
$ sudo docker exec postgres psql --username dbadmin -c "CREATE DATABASE crawler" sandbox- The
docker execruns a command in the previously created docker container namedpostgres psqlruns the postgreSQL--usernameruns the following psql command under the usernamedbadmin-cruns the psql command "CREATE DATABASE crawler"sandboxis the default database specified earlier
$ sudo docker exec postgres psql --username dbadmin -f /var/lib/postgresql/data/crawler.sql crawler-fruns the commands in the postgres file crawler.sql that we linked to\SOME_FOLDERcrawleruse the crawler database created in the previous step
- The
-
Start the web application
a. Start the application with the following.$ sudo docker run -d -p 8080:8080 --link postgres:postgres --name ht-archive bmenn/ht-archive --db crawler --usr dbadmin --pwd 1234 --host postgres- The command downloads, configures, and runs the docker image and container named
ht-archive: -dsets it as a daemon that will run in the background-psets the network port--linkallow networking to the first container namedpostgres--namenames the names the containerht-archivebmenn/ht-archiveuses the docker image from github located at bmenn/ht-archive The following are all for the app.js web application and specify the username, database, password and host:--dbsets the database name--usrsets the username--pwdsets the database password--hostsets the name of the server hosting the database (postgres)
b. Open a web browser and enter the following in the address:
localhost:8080 - The command downloads, configures, and runs the docker image and container named
-
Stopping, starting the services:
All running docker containers are visible via:$ docker psa. To stop the running containers 'postgres' and 'ht-archive', enter:
$ docker stop ht-archive $ docker stop postgresb. To restart the services run the following. All of the port settings, environment variables, mounts, etc. are all preserved in the docker container, so only
docker start <container name>is needed.$ docker start postgres $ docker start ht-archive -
If you have made some changes and want to see the updated application run
$ node app.js --db crawler --usr dbadmin --pwd 1234 --host REPLACE_ME_WITH_DOCKER_OR_POSTGRES_IP
If the $ docker exec command fails with "Error response from daemon: Container #### is not running", a likely cause is that the database isn't reachable or running. Perhaps the simplest fix is to remove the docker container (named 'postgres'). Keep in mind all docker containers are visible via $ docker ps -a and all docker images are visible via $ docker images:
a. Remove the docker container postgres:
$ docker rm postgres
b. Ensure that the /SOME_FOLDER is empty. Re-start from the 'Build/run the PostgreSQL server' step. Be sure to complete all docker exec steps completely before exiting the postgres docker container.
Installation Follow these steps if you already have a PostgreSQL server running locally and would rather not use Docker.
-
Download the PostgreSQL backup from one of the following places:
-
Extract the SQL file from the downloaded
crawler_er.tar.gz, using the following command on the command line or an archive tool:$ tar xzf /the_folder_with_backup/crawler_er.tar.gz -
Set up your local database
Log into your postgres server as root and create a new superuser named
dbadminwith login permissions.postgres> CREATE ROLE dbadmin WITH SUPERUSER LOGIN PASSWORD 1234; -
Load the SQL into PostgreSQL
Create a database called
crawler, exit psql and run the following command. Be patient - the query can take 15 minutes or so to run.$ psql --host=localhost --dbname=crawler --username=dbadmin -f <path/to/.sql/file> -
Once the data has been loaded into PostgreSQL, start the web application and navigate to
localhost:8080in your browser.$ node app.js --db crawler --usr dbadmin --pwd 1234 --host REPLACE_ME_WITH_DOCKER_OR_POSTGRES_IP
$ node app.js --usr postgres-user --host hostname --db database # the app will launch on port 8080