This repo contains all the input data, code, and output required for our research. If you wish to execute code, the python scripts are likely the easiest to implement; Colab notebooks require a copy of the original data in your Google Drive.
codeall code required to perform research.- The script
analysis.pyand notebookanalysis.ipynbboth serve as the primary point of interaction and execution for training models, the latter for Colab. The notebooktransfer.ipynbsimilarly manages working with pre-trained models. data_mgmt.pymanages data on disk. The first time you train a model, this script will organize the appropriate test/train data organization. In the case of 3D data, it writes 100Gb of numpy arrays.models.pyandmodels3D.pycontain functions that when called create, train, and save models. They also contain a wrapper class definition for convenience.
- The script
data/originaluntouched DFDC data. Functions indata_mgmt.pywill write to thedatadirectory when organizing your local environment.reportwritten report, slides, and recording of presentation.saved_modelsall the final models, saved in tensorflow formatsmall_localpoint functions here for small local testingvisualizationvisuals.ipynbscript for making training diagrammodel_diagramsimages of different network architecturesplotting_dataall the train/test logs used for graphics