Codes for article From scapegoating to authoritarianism – topic modelling of governmental parliamentary discourses about migration in Hungary between 2015/2016 and 2022/2023
You can download raw data from parlament.hu with an API key (see for further details). The data was downloaded with the codes written for K-monitor.
repo sturcuture:
- src folder: code files
- 00_make_xlsx.py: parsing json files collected with the apicollector
- 01_filter.py: filtering speeches to relavant speech types and keywords
- 02_parl_spec_preproc.py: removing text segments added by the notary
- 03_preproc_spacy_parl.py:
- character normalization
- NER based on spacy model hu_core_news_trf and snake_casing entities
- 04_preproc2_parl.py: (interactive run)
- removing formalized greetings etc
- removing stopwords based on frequency
- removing keywords used for corpus filtering
- removing numbers except frequently mentioned years
- removing hypens
- snake casing frequent n_grams based on frequency
- removing outliers by length
- STM_parl_202505.r: STM model fitting and analysis script
- data: filtered and preprocessed data files
- resources: helper files created during preprocessing
- models: STM model files and results