-
main.py
• Running this file will automatically run all the other files inData Prep.
• Input the starting folder (the root folder of the dataset), the folder you want to extract the data to, and the folder you want to store the imputed data. -
change_extension.py
• This file turns the given Scopus dataset into.jsonfiles. -
data_extraction.py
• This file loops through each year of the Scopus data set and combine it into 1 single file while removing unncessary data. -
impute_missing_value.py
• This file imputes any missing values in the dataset. -
remove_duplicates.py<br / > • This file will drop duplicated paper from the file.
-
main.py
• Use this file to runweb_scraping.py -
web_scraping.py
• Runmain.pyto start scraping from IEEExplore.
• Do note that the script will sometimes havenullin the date field as it can't be found in some documents.
• For other missing values, the script will not save the file. -
join_json.py
• This file is used to join the json files obtained from web scraping into one file.
• You can then use the result of this function to impute the missing values usingimpute_missing_value.py.
-
Model.ipynb
• This file trains the model from the data collected from Scopus and from web scraping.
• We usedLatent Dirichlet Allocation (LDA)andK Means. -
combine_csv.py
• This file is used for combining all the CSV files together into 1 file.
-
data_visualisation.py
• This file visualises the data from Data Prep as well the model's fitted data.
• The data is visualised using StreamLit.
• There are 3 files which are used here (main_data.csv,cluster_data.csv,calculated_map_data.csv). -
calculate_map.py
• This file is originally indata_visualisation.py, but since it was too computationally heavy.
• Therefore it was split off and the output is stored in a file to be loaded instead.