SentenceRepresentationsCodebase

My own codebase for experimenting with Sentence Represenantations for Advanced Topics in Computational Semantics

0. Organization of repository:

encoders/ 		# sentence encoders: including lstms and mean of embeddings
heads/ 			# classification heads that take embeddings for specific datasets (e.g. SNLI)
utils/ 			# various auxiliary functions with mixed use
train.py 		# main script for training (see section 3)
eval.py 		# main script for evaluation (see section 4)

Extra directories which are created during setup:

runs/ 			# directory with all models and tensorboard runs
pretrained/		# directory with glove embeddings
tokenized/		# directory where tokenized data is stored for efficiency

You can download my runs here

1. Setup

1.1 Install environment

conda env create --name acts_gpu --file=acts_gpu.yaml
source activate acts_gpu

1.2 Download glove embeddings

bash download_glove.sh

1.3 Install SentEval

Clone repo from FAIR github to this directory

git clone https://github.com/facebookresearch/SentEval.git
cd SentEval/

Install SentEval

python setup.py install

Download datasets

cd data/downstream/
./get_transfer_data.bash

2. Running on Lisa / Snellius

Install environment using

sbatch install_environment.job

Run interactive session

srun --partition=gpu --gpus=1 --ntasks=1 --cpus-per-task=18 --time=04:00:00 --pty bash -i

And later

module purge
module load 2022
module load Anaconda3/2022.05

source activate acts_gpu

3. Training Models

Example for LSTM:

python3 train_snli.py --encoder lstm --max_epochs 50 --batch_size 64 --optimizer_lr 0.1 --encoding_dim 2048 --lr_decay 0.99

Currently supported encoders: mean_embeddings lstm bilstm bilstm_max

4. Evaluating Models

Example for LSTM:

python3 eval.py --transfer --snli --path runs/exp_20240418_145108_lstm_2048/model_13_checkpoint.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentenceRepresentationsCodebase

0. Organization of repository:

1. Setup

1.1 Install environment

1.2 Download glove embeddings

1.3 Install SentEval

2. Running on Lisa / Snellius

3. Training Models

4. Evaluating Models

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
encoders		encoders
heads		heads
utils		utils
.gitignore		.gitignore
README.md		README.md
acts_gpu.yml		acts_gpu.yml
demo_notebook.ipynb		demo_notebook.ipynb
download_glove.sh		download_glove.sh
eval.py		eval.py
install_environment.job		install_environment.job
train_snli.py		train_snli.py

jakub-podolak/SentenceRepresentationsCodebase

Folders and files

Latest commit

History

Repository files navigation

SentenceRepresentationsCodebase

0. Organization of repository:

1. Setup

1.1 Install environment

1.2 Download glove embeddings

1.3 Install SentEval

2. Running on Lisa / Snellius

3. Training Models

4. Evaluating Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages