Skip to content

GoreLab/nlp_trait_preferences

Repository files navigation

nlp_trait_preferences

All scripts are written in Python and follow the order outlined here:

1_nlp_extract_labels.py: pre-processes data, included lemmatizing, making all strings lowercase, and removing stop words. Spelling discrepancies are resolved using “spelling_corrections.xlsx”, and synonyms are replaced using “synonyms.xlsx.” The “string_concatenation.xlsx” file was generated through iteratively searching bigrams and trigrams to extract labels.

2_nlp_manual_tagging.py: samples data to generate the example sets for one-shot learning for multi-label text classification by the GPT model. These need to be manually labeled.

3_nlp_textcat_llm.py: performs multi-label text classification using the GPT model and labels specified in the config files, “zeroshot_all.cfg” and “oneshot_all.cfg."

4_nlp_evaluate_models.py: evaluates the GPT model performances by comparing the GPT labeled data to the manually labeled data, available in “manual_tagging.xlsx”. Micro-precision, micro-recall, and micro-F1-scores are calculated.

5_nlp_figures.py: generates all figures and tables included in the manuscript

nlp_functions.py: includes the functions necessary for 1_nlp_extract_labels.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages