NASA SpaceAppsChallenge: Develop the Oracle of DSCOVR
- python >= 3.10
- npm >= 9.2.0
- Create virtual environment:
python3 -m venv ./venv-
Activate the environment
- for linux:
source ./venv/bin/activate- for windows:
.\venv\activate
-
Install the dependencies:
- python:
pip install -r requirements.txt
- npm (open the second terminal):
cd ./client && npm i
-
Download data and place it inside
/datadirectory (create it beforehand) in the project root. -
Download model and place it inside
/modelsdirectory (create it beforehand) in the project root. -
Run server:
python run.py- Run client in the 2nd terminal:
npm start- To train: specify data in
/nn/conf/config.yamland run:
python ./nn/train.py- To validate: specify data in
/nn/main.pyand run:
python ./nn/main.pyThis project encorporates:
/app- Backend/nn- Deep Learning model
Backend is written in Flask and the Deep Learning Model that we called DSCOVR(Y) was developed in PyTorch.
Our Deep Learning model had been trained on data from 2 datasets that we merged together:
- raw data from the satellite - German Research Center for Geosciences
- planetary k-index - NASA
Short names for datasets:
RDS_D- raw data from the satellite datasetKP_D- planetary k-index dataset
RDS_D has data for each minute of a year, whereas KP_D has data for each 3 hour period. Moreover KP_D data entry is not for singe discrete hour, it is a range of 1 hour and 50 minutes.
This gives as:
RDS_D- data for each minuteKP_D- aggregated? data for 1 hour and 50 minutes of each 3 hour period (1 hour and 10 minutes thereby is a dark spot)
Basically, we have checked each RDS_D entry if it's in range of time (1 hour and 50 minutes) in any KP_D period of observations.
We have used 2 scripts to clean the data, they are located in the /scripts directory.
kp_dataset_clean.pyused to clean the Kp indices data that contained a lot of unrelevant data to the problem in the first place. It also produces well structured file with the delimeters, makeing it possible forpandasto open it.datasets_merge.pyiterates over the satellite raw data, for each entry it then iterates over the Kp indices data and if the entry is in range of time measurement of any Kp index, then it adds the Kp index column to the entry.
DSCIVR(Y) model has 53 inputs and 1 output. It has 1,175,681 parameters.
The architecture:
- linear (53, 128)
- batch_norm (128)
- linear1 (128, 256)
- batch_norm (256)
- linear2 (256, 256)
- batch_norm (256)
- lstm (256, 256, 2 layers)
- linear3 (256, 64)
- linear4 (64, 1)
Plantary k-index data is subjected to CC BY 4.0 license.
The license of our project is Apache 2.0.