Repository metadata as dataset
The development was done using Jupyter extension in Visual Studio Code, so it may work diirectly with "pure" Jupyter, but probably there are packages missing to run with Jupyter Labs.
To set the environment inside WSL (probably the same for any Linux):
mgarcia@arda:~/Work$ git clone https://github.com/kaust-library/RepoDataset.git
mgarcia@arda:~/Work/RepoDataset$ python -m venv venv
mgarcia@arda:~/Work/RepoDataset$ . venv/bin/activate
mgarcia@arda:~/Work/RepoDataset$ pip install -r requirements.txt
#
# Unzip the sample data file
mgarcia@arda:~/Work/RepoDataset$ cd data
mgarcia@arda:~/Work/RepoDataset/data$ unzip kaust_repo.zip
Archive: kaust_repo.zip
inflating: kaust_repo.csv
mgarcia@arda:~/Work/RepoDataset/data$
Using the jupyter notebook to explore the Insitutionl Repository metadata dataset.
We provide three examples to explore the dataset:
repodataset.ipynb: Publication types, and top publishers.gettext.ipynb: Download the text version of the PDF, or save the URLs to a text that can be downloaded with commands likewgetorcurl.vosviwer.ipynb: Export certain fields to Vosviewer, like abstract or authors. There is an option to plot the map using all fields, but it takes several minutes to create the plot.
We provide a sample of the data in the data directory, that is for a simple tests. Once the tests are done, one can use the full dataset available on the repository website:
https://repository.kaust.edu.sa/bitstream/handle/10754/691066/KAUST_Affiliated_Research_Basic_Metadata.csv
The VOSviewer notebook expects to find the full CSV file in the data directory. One can download directly using the URL above.
A sample CSV with thesis and dissertations data set:
https://repository.kaust.edu.sa/bitstream/handle/10754/694214/KAUST_Open_Access_ETDs_Basic_Metadata.csv