Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,7 @@ venv.bak/

.DS_Store
.idea/

#sean_fork
sean_notes.txt
results/*
2 changes: 1 addition & 1 deletion CO_ATTN/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def load_word_embedding_dict(embedding, embedding_path, word_alphabet, logger, e
logger.info("Loading GloVe ...")
embedd_dim = -1
embedd_dict = dict()
with open(embedding_path, 'r') as file:
with open(embedding_path, 'r', encoding='utf-8') as file:
for line in file:
line = line.strip()
if len(line) == 0:
Expand Down
46 changes: 39 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,54 @@
# co-attention
# Co-Attention
Code for BEA 13 paper "Co-Attention Based Neural Network for Source-Dependent Essay Scoring"

## Dependencies

python 2 for data/preprocess_asap.py (will be upgraded to python 3)
Python 2 for data/preprocess_asap.py (will be upgraded to Python 3).

python 3 for the rest
* I recommend that on installation, you *do not add to your PATH variable*. This way, it doesn't interfere with your current Python workflow.
* Then, when you need to run the preprocessing script, you'll run it something like:
* *c:/Python27/python.exe preprocess_asap.py*

Python 3 for the rest

* tensorflow 2.0.0 beta
* gensim
* gensim may have more dependencies, such as VS tools
* nltk
* sklearn

run python2 data/preprocess_asap.py for data splitting.
Download Glove pretrained embedding from https://nlp.stanford.edu/projects/glove
Extract glove.6B.50d.txt to the glove folder
run python3 attn_network.py [options] for training and evaluation
## Running on Linux, MacOS

1. Run *python2 data/preprocess_asap.py* for data splitting.
2. Download GloVe pretrained embedding from: *https://nlp.stanford.edu/projects/glove*
3. Extract *glove.6B.50d.txt* to the glove folder
4. Run *python3 attn_network.py [options]* for training and evaluation

## Running on Windows

To run on Windows, do all of the commands for Linux/MacOS. Then you'll need to remove two "\n" symbols from the preprocessing script.

1. open *data/preprocess_asap.py* in your preferred text editor
2. on lines 28 and 31 in the preprocessing script, you'll find: *f_write.write("\r\n")*
3. remove the *\n* from both lines

## Results

After preprocessing the data, the program will start the training process. At the end of each epoch, the logger will
output the development and testing set scores. The highest will be kept and outputted after all epochs are complete. You can toggle the which
task (the default is ASAP3), the number of epochs (the default is 50), and more by looking at the arguments in lines 21-51 of *attn_network.py*.

Additionally, if you want to look at specific essays with their predicted and actual scores:

1. go to the *checkpoints* folder
2. after training, there should be a text file with one number per line. the line number corresponds to the essay number in the test data.
3. in the various fold directories, open the test.tsv file and compare with the predicted scored in from step 2.

## Coming Soon

1. making specific essays, as well as their predicted scores and real scores, more accessible.
- likely a Python script
2. updating *the preprocessing_asap.py* to be compatible with Python 3

## Cite
If you use the code, please cite the following paper:
Expand Down
46 changes: 46 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
absl-py==0.11.0
astunparse==1.6.3
cachetools==4.2.0
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
Cython==0.29.14
flatbuffers==1.12
gast==0.3.3
gensim==3.8.3
google-auth==1.24.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
idna==2.10
joblib==1.0.0
Keras-Preprocessing==1.1.2
Markdown==3.3.3
nltk==3.5
numpy==1.19.5
oauthlib==3.1.0
opt-einsum==3.3.0
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
regex==2020.11.13
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7
scikit-learn==0.24.1
scipy==1.6.0
six==1.15.0
sklearn==0.0
smart-open==4.1.2
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
threadpoolctl==2.1.0
tqdm==4.56.0
typing-extensions==3.7.4.3
urllib3==1.26.2
Werkzeug==1.0.1
wrapt==1.12.1