ESALE: Enhancing Code-Summary and Alignment Learning for Source Code Summarization

Environment

ubuntu 18.04

Requirements

python==3.8
torch==1.7.1
transformers==4.6.1
tqdm==4.64.0
numpy==1.22.3

Dataset

Original dataset

We fetch JCSD and PCSD from https://github.com/gingasan/sit3.

Preprocessed dataset

To get the top N action words of each dataset, you can run the code as follows:

python utils/split.py \
    --dataset_name JCSD \
    --aw_cls 40

To get the deduplicated dataset, you can run the code as follows:

python dataset/build_JCSD_PCSD.py
python dataset/build_SiT.py

ESALE

Shared Encoder Training

python encoder_finetune.py \
    --output_dir outputdir/ESALE \
    --dataset_name JCSD \
    --model_name_or_path  microsoft/unixcoder-base \
    --with_test \
    --with_mlm \
    --with_ulm \
    --with_awp \
    --with_cuda \
    --epochs 50

Decoder Training

Since it takes too much time to generate summaries, we randomly choose 10% data from test dataset as test_demo when training a decoder.

python decoder_finetune.py \
    --output_dir outputdir/ESALE \
    --dataset_name JCSD \
    --model_name_or_path  microsoft/unixcoder-base \
    --unified_encoder_path outputdir/ESALE/unified_encoder_model/model.pth \
    --do_train \
    --do_eval \
    --do_pred \
    --with_cuda \
    --eval_steps 5000 \
    --train_steps 100000

Predict

python predict.py \
    --output_dir outputdir/ESALE \
    --dataset_name JCSD \
    --model_name_or_path  microsoft/unixcoder-base \
    --unified_encoder_path outputdir/ESALE/unified_encoder_model/model.pth \
    --load_model_path outputdir/ESALE/checkpoint-best-bleu/model.bin \
    --with_cuda

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
evaluation		evaluation
models		models
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
decoder_finetune.py		decoder_finetune.py
encoder_finetune.py		encoder_finetune.py
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ESALE: Enhancing Code-Summary and Alignment Learning for Source Code Summarization

Environment

Requirements

Dataset

Original dataset

Preprocessed dataset

ESALE

Shared Encoder Training

Decoder Training

Predict

About

Uh oh!

Releases

Packages

Languages

License

YMKPK/ESALE

Folders and files

Latest commit

History

Repository files navigation

ESALE: Enhancing Code-Summary and Alignment Learning for Source Code Summarization

Environment

Requirements

Dataset

Original dataset

Preprocessed dataset

ESALE

Shared Encoder Training

Decoder Training

Predict

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages