This repository contains the python scripts and notebooks for Assignment-3 NLP.
The assignment is divided into 3 main sections:
- Pretraining the model(bert-base-uncased) on given dataset (wikitext_raw_2_v1)
- Finetuning the pretrained model for specific tasks (Classification and Question & Answering)
- Evaluation of finetuned models using specified metrics (Classification: Accuracy, Precision, Recall, F1 ; Question-Answering: squad_v2, F1, METEOR, BLEU, ROUGE, exact-match)
To find all the respective code files, refer to the below file structure:
- Pre-training Dataset: wikitext_raw_2_v1
- Pre-training Dataset Eval: wikitext_raw_2_v1_test
- Pre-training Code File: pretraining
- Pre-training Perplexities on Test Dataset: pretrained_perplexities
- Classification Task Finetuning: finetuning_classification
- Question & Answering Task Finetuning: finetuning_question_answering
- Evaluation Pre-trained model: eval_pretrain
- Evaluation Classification Finetuned model: eval_classification
- Evaluation Question & Answering Finetuned model: eval_question_answering
- Calculating Parameters for models: parameter_calculation
Our pre-trained model is pushed on Hugging Face as: Skratch99/bert-pretrained
Our fine-tuned model for classification is pushed as: Nokzendi/bert_sst2_finetuned
Our fine-tuned model for question & answering is pushed as: Skratch99/finetuned-bert-squadv2
The assignment report can be found here.