How do the affiliations of researchers influence the diversity of engineering research topics?

Data Preparation:

main.py
• Running this file will automatically run all the other files in Data Prep.
• Input the starting folder (the root folder of the dataset), the folder you want to extract the data to, and the folder you want to store the imputed data.
change_extension.py
• This file turns the given Scopus dataset into .json files.
data_extraction.py
• This file loops through each year of the Scopus data set and combine it into 1 single file while removing unncessary data.
impute_missing_value.py
• This file imputes any missing values in the dataset.
remove_duplicates.py <br / > • This file will drop duplicated paper from the file.

main.py
• Use this file to run web_scraping.py
web_scraping.py
• Run main.py to start scraping from IEEExplore.
• Do note that the script will sometimes have null in the date field as it can't be found in some documents.
• For other missing values, the script will not save the file.
join_json.py
• This file is used to join the json files obtained from web scraping into one file.
• You can then use the result of this function to impute the missing values using impute_missing_value.py.

Model.ipynb
• This file trains the model from the data collected from Scopus and from web scraping.
• We used Latent Dirichlet Allocation (LDA) and K Means.
combine_csv.py
• This file is used for combining all the CSV files together into 1 file.

data_visualisation.py
• This file visualises the data from Data Prep as well the model's fitted data.
• The data is visualised using StreamLit.
• There are 3 files which are used here (main_data.csv, cluster_data.csv, calculated_map_data.csv).
calculate_map.py
• This file is originally in data_visualisation.py, but since it was too computationally heavy.
• Therefore it was split off and the output is stored in a file to be loaded instead.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
.gitattributes		.gitattributes
Data Sci Project.pdf		Data Sci Project.pdf
README.md		README.md