Traven_stylometry

Scripts and corpora for the authorship attribution of B. Traven's works, using the method of the "impostors".

Structure

The main branch of the repository contains materials for the paper Are Ret Marut and B. Traven the same person? Fine tuning the impostors method, presented at the DH2023 Conference (Paper | Slides).
The DH2022 branch contains materials for the paper Traven between the impostors. Preliminary considerations on an authorship verification case, presented at the DH2022 Conference (Paper).

Instructions

Run the bash_imposters_parallel.sh file (via: bash bash_imposters_parallel.sh), which will run all R scripts in a sequence. Scripts are designed for parallel processing, to accelerate computation speed.

Features

Analysis features are defined in the analysis_features.csv file. You can modify them to run different analyses:

n_cores defines the number of cores for parallel processing (the script currently supports from two to six cores)
n_best_imposters defines the number(s) of best impostors on which to run tests (you should separate the numbers with a space)
n_development_authors defines the number of authors to consitute the development set
unit defines the unit of analysis. You can choose between "1_words", "2_characters", "3_characters", etc. (currently, the script does not support word ngrams and runs with just one unit at the time)
MFU_series defines the number(s) of most frequent units (words or characters) on which to run tests (you should separate the numbers with a space)
culling defines the level(s) of culliung with which to run tests (you should separate the numbers with a space)
validation_rounds defines the number of repetitions for each configuration
distances defines the stylometric distances to be used (you should separate the names with a space)

Scripts

01_prepare_imposters_corpora.R prepares corpora by running a first stylometric analysis and selecting the authors closest to the test set
02_evaluate_parallel_processing.R reads analysis features from the analysis_features.csv file and prepares instructions for parallel processing
03_prepare_analysis_tables.R prepares datasets for the actual analysis, by creating Term-Document-Frequency tables for each combination of texts
04_imposters_analysis.R performs the impostors analysis
05_process_results.R conflates the results and saves them to a Results.txt file

Corpora

Texts to be analysed are in the corpus folder:

Traven_Marut_corpus.RData contains the four novels by Traven and Marut on which to perform the analysis. Novels have been split into tokens, which have been reordered alphabetically (thus not allowing reconstruction of the original texts, which are still copyright protected)
Kolimo_metadata.csv contains metadata of the Kolimo corpus, from which development set and impostors will be extracted. The corpus itself will be downloaded by the R scripts

Requirements

R packages: stylo, tidyverse, and class. Run the Requirements.R script to install them.
The bash script should run via command line on Unix-like systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Traven_stylometry

Structure

Instructions

Features

Scripts

Corpora

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
corpus		corpus
.gitignore		.gitignore
01_prepare_imposters_corpora.R		01_prepare_imposters_corpora.R
02_evaluate_parallel_processing.R		02_evaluate_parallel_processing.R
03_prepare_analysis_tables.R		03_prepare_analysis_tables.R
04_imposters_analysis.R		04_imposters_analysis.R
05_process_results.R		05_process_results.R
LICENSE		LICENSE
README.md		README.md
Requirements.R		Requirements.R
analysis_features.csv		analysis_features.csv
bash_imposters_parallel.sh		bash_imposters_parallel.sh

License

SimoneRebora/Traven_stylometry

Folders and files

Latest commit

History

Repository files navigation

Traven_stylometry

Structure

Instructions

Features

Scripts

Corpora

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages