I reran the analysis to account for a bug spotted by Christos Petrou on twitter. I was unaware of the existence of Collections and Sections at MDPI, and had labelled every paper in Collections and Sections as a Special Issue article. The analysis now takes into account all four types of papers (Normal and Special issues, Sections and Collections). The main message does not change.
This repository contains:
-
in the
Special Issuesfolder:-
scripts to scrape MDPI website and get the number of special issues for all of their 74 journals with an Impact Factor;
-
data of the scraping, ran on April 15th, 2021;
-
analysis of the scraping data in the form of a plot and a summary table.
-
-
In the
Editorial Historyfolder:-
scripts to scrape the last 6 years' worth of articles published at the 74 MDPI journals with an IF;
-
data resulting from the scraping, ran in the third week of April, 2021;
-
analysis of the scraping data, in the form of a long .R file producing several plots and tables.
-
The data was the basis for a blog post on MDPI practices that appeared on my blog.
The data you want is provided in several .csv files.
-
for the Special Issues:
-
journals.csvcontains basic data on 74 MDPI journals -
SIs.csvcontains a line for each MDPI special issue ever made, or now open and in progress, for which a deadline exists. -
summary.csvis a summary dataset containing one row per journal and one column per year, and summarising the number of SIs per year.
-
-
for the Editorial History and turnaround times
- one csv for each journal in the
Datafolder
- one csv for each journal in the
Have a look at the blog post. Alternatively, you find the plots and tables in the Special Issues and Editorial History subfolders.
-
get your dependencies right. The code depends on
-
tidyversefor everything from wrangling to plotting -
rvestfor scraping -
stringrfor string manipulation -
purrrto usemapand avoid loops -
lubridateto work with dates -
magrittrfor the%$%operator -
ggridges,gghalves,ggbeeswarm,waffle,patchwork,ggrepel,ggtext, that are all tools to extendggplot -
hrbrthemesfor eye-candy and beautiful typography in the plot
-
-
for the Special Issues:
-
run
scraping.R. It takes about 20 minutes to run. It generatesjournals.csv, with journals basic data, andSIs.csv, with the list of all 55k+ special issues of the 74 MDPI journals. -
(if you want to plot) run
plotting.R
-
-
for the Editorial History:
-
run
scrape_editorial_history.R. It takes forever to run, up to a week or more, use parallel calls, or simply do not run it. If you really want to run it consider using the Rstudiojobsapi that allows to parallelize the scrape. This is what therstudio_jobs_scrape.Rscript is for. -
run
analyse_editorial_history.R. It produces several different outputs and is at this point rather messy. You know, open science is messy and this is just a side project, sorry!
-
Feel free to clone this repo. I would appreciate if you kept me posted on what you do.