DeepCodeProbe
is a tool designed for probing small ML models trained on syntactic representations of code in order to provide interpretability on their syntax learning capbilities alongside the represeantions they learn. The tool is designed to be model agnostic and can be used with any model that uses AST/CFG as input.
The experiments were carried out using Python 3.10. Install the dependencies for DeepCodeProbe with:
pip install -r requirements.txt
Additionally, each of the models under study have their own dependencies. In order to train the models and replicate the results, you need to install the dependencies for each model.
First, create a virtual environment and activate it:
python -m venv astnn
source astnn/bin/activate
Then, install the dependencies for AST-NN:
pip install -r src/astnn/requirements.txt
First, create a virtual environment and activate it:
python -m venv funcgnn
source funcgnn/bin/activate
Then, install the dependencies for FuncGNN:
pip install -r src/funcgnn/requirements.txt
First, create a virtual environment and activate it:
python -m venv summarizationtf
source summarizationtf/bin/activate
Then, install the dependencies for SummarizationTF:
pip install -r src/summarization_tf/requirements.txt
First, create a virtual environment and activate it:
python -m venv code_sum_drl
source code_sum_drl/bin/activate
Then, install the dependencies for CodeSumDRL:
pip install -r src/code_sum_drl/requirements.txt
In order to train the models, you need to download the dataset. Each model has its own dataset. The datasets can be downloaded from the following links:
- AST-NN: AST-NN Dataset
- FuncGNN: FuncGNN Dataset
- SummarizationTF: SummarizationTF Dataset
- CodeSumDRL: CodeSumDRL Dataset
After downloading the datasets, put them in the dataset
directory at the root of each models' source directory.
Afterwards, you can train the models by running the following the instructions in the README files of each model:
- AST-NN:
src/ast_nn/README.md
- FuncGNN:
src/funcgnn/README.md
- SummarizationTF:
src/summarization_tf/README.md
- CodeSumDRL:
src/code_sum_drl/README.md
After training the models, you can train the probes by running the following command:
python src/probe_model.py --model {model_name}
Where {model_name}
is the name of the model you want to train the probe for. Please note that each model requires a different probe configuration. The configurations for each model are outlined in probe_model.py
.
After training the probes and the models, you can reproduce the validation results by running the following command:
python src/validate_probe.py --model {model_name}
Where {model_name}
is the name of the model you want to evaluate the probe for. Similar to training the probes, each model requires a different probe configuration. The configurations for each model are outlined in validate_probe.py
.