A set of tools and scripts that download and process blockchain and cryptocurrency course data, generate a dataset, use it to teach a deep learning neural network to make value predictions and evaluate the result.
The project implements the theoretical and experimental setup of a paper, which is currently undergoing peer review.
The tools require the installation of Parity client, Node.js, Python 3, Pipenv and optionally MongoDB.
The project includes C++ optimized code. Installation of the GCC Compiler, as well as the Pybind11 library is required in order to compile the C++ parts of the project.
Clone the git repository and install the node dependencies:
git clone https://github.com/Zvezdin/blockchain-predictor.git
cd blockchain-predictor
npm installInstall the required python dependencies via the following script:
pipenv installRun the script build.sh under c++ folder.
Proceed to use/run this project after running pipenv shell
All python tools implement a CLI with a help page. It can be displayed by running python something.py -h.
Run a parity instance with --tracing on flag. A possible configuration could be:
parity -d /some/where --tracing on --mode active --cache-size 16384 --force-sealing --allow-ips public --min-peers 50 --max-peers 100 --jsonrpc-threads 10The initial sync can take multiple hours. Wait for full sync before proceeding.
There are multiple options for a data store as a backend. Available options are defined in database/. By default, hdfs_store_database.py is used and hence no database instance needs to be started. The filepath to the h5 store file is defined that database file (for now).
If instead you want to use arctic_store_database.py, you have to first run an instance of MongoDB with:
mongod --dbpath /path/to/your/dbThe blockchain information needs to be downloaded from the running parity client to the database. This is done using:
python arcticdb.py --course
python arcticdb.py --blockchainIt may take a while depending on which database is used.
Data properties are an extraction of the most important moments from the bulk raw data. They are generated for each course tick (time interval for which we have course data).
To generate all of the available properties for all downloaded data, run the following command:
python property-generator.py --action generateTo generate one or more properties for all downloaded data, run the following command:
python property-generator.py --action generate --properties openPrice,closePriceAfter the needed data properties are generated, you can proceed with generating the actual dataset. The dataset is generated using a certain dataset model. There are multiple dataset models that "compile" the properties and structure the dataset in a different way. The default is matrix, which generates matrices from a moving window over all of the properties.
Dataset generation requires providing a list of comma separated properties to be included in the body and also a list (or a single item) of comma separated properties as a target / expected output.
Example:
python dataset_generator.py openPrice,closePrice stickPrice --filename some/where/dataset.pickleArguments --start and --end can be used as trimmers for the dataset:
python dataset_generator.py openPrice,closePrice stickPrice --start 2017-03-14-03 --end 2017-07-03-21In most cases when training neural networks, we will need two or three datasets - a train, validation (optional) and a test dataset. These datasets can be generated using separate calls to our dataset_generator for different dates, but we recommend to use one date interval that covers all our data and then split the resulting dataset into the needed parts. In our tool, this is done the following way:
python dataset_generator.py openPrice,closePrice stickPrice --ratio 6:2:2Please keep in mind that the matrix model has dozens of hyperparameters that have been tuned for most cases. If your case differs, you need to change them in the source code of the matrix model.
The generated dataset can be used to train neural networks. The supported networks depend on the chosen dataset model. The matrix model supports all networks.
To train our convolutional network on an already generated dataset and also shuffle the train dataset, we can do the following:
python neural_trainer.py path/to/your/dataset.pickle --models CONV --shuffleTraining a neural network can't be that simple, right? Right! You can should override the default network hyperparameters to suit your dataset and problem needs. This can be done via:
python neural_trainer.py data/test.pickle --models CONV --args epoch=5,batch=1,lr=0.0001,kernel=3This example sets the number of training epochs, the batch size, learning rate and kernel size for the whole convolutional network. Each network architecture has its own set of hyperparameters and they are defined with the network specification itself.
After training, the network's performance will be evaluated with the test dataset and measured by 4+ different accuracy/error scores. The performance on the train and test datasets will also be visualized on a graph by opening a new window. If you do not wish training to be blocked by a graph window, you can save the graph to a file instead, by passing the --quiet parameter. This is useful for automated training of multiple networks, as it allows you to review the results afterwards.
Our other neural models include CustomDeep, LSTM and more to come.
If needed, this project also provides a low-level tool that can download data from a crypto exchange / a blockchain node and save it as a .json in a given directory (by the --filename some/where argument).
To download and save course data for the whole history of the cryptocurrency, run:
node data-downloader.js --courseTo download blocks 10 through 100, use:
node data-downloader.js --blockchain 10 100