🎓 Automate your weekly scientific literature review
LiRA is a CLI Python program based on PyMed and SerpAPI to search on PubMed and get the results programmatically in a readable HTML page.
I created LiRA mainly for myself, but feel free to use it if you find it useful.
All you need is on this README!
- Install
- Configuration
- Usage
- Need
--help? - Release history
- Future developments
- Known problems
- Meta
- Contributing
Notice: I provided instructions to install and use LiRA with pipenv, but you can use any virtual environment
manager such as Conda or Venv.
⚠️ LiRA currently requires Pyhon 3.8 to run. If you don't have it in your system, consider using Pyenv for the installation.
If you don't have it, install Pipenv:
pip install pipenv --userClone this repository using git:
git clone https://github.com/fpradelli94/LiRA.gitInstall the project requirements listed in requirements.txt:
cd LiRA
pipenv install --python 3.8 -r requirements.txt
⚠️ LiRA currently requires Pyhon 3.8 to run. If you don't have it in your system, consider using Pyenv for the installation.
Make sure you have pipenv installed (else, install it with brew install pipenv):
> pipenv --version
pipenv, version 2024.4.1. # you should get something like thisClone this repository using git:
git clone https://github.com/fpradelli94/LiRA.gitGo into the cloned folder:
cd LiRAUse pyenv to ensure you locally have Python 3.8
pyenv install 3.8
pyenv local 3.8
Use pipenv to set up a virtual environment:
pipenv install --python 3.8 -r requirements.txtLiRA requires a configuration file in the .json format to work. You can find a template in in/template_config.
To generate the configuration file, simply type:
mkdir config
cp in/template_config.json config/config.jsonThis will generate an empty configuration file for LiRA.
Start filling the config.json file with information necessary to use the APIs.
- Pubmed: enter your email (no API key necessary for now)
- Google Scholar: enter your SerpAPI key
- OpenAlex: enter your OpenAlex key (get one here)
SerpAPI is a third-party API to scrape Google Scholar. You can get your key for free here
{
"engines": ["pubmed", "google_scholar", "openalex"],
"email": "PLACE YOUR EMAIL HERE",
"serpapi_key": "PLACE YOUR SEPAPI KEY HERE",
"openalex_key": "PLACE YOUR OPENALEX KEY HERE",
"keywords": [],
"authors": [],
"journals": [],
"highlight_authors": []
}Then, you can add your keywords, authors, and journals of interest in the configuration file. For instance, a meaningful configuration file might look like this:
{
"engines": ["pubmed", "google_scholar", "openalex"],
"email": "example@gmail.com",
"serpapi_key": "aobosandi392309qwjadosnasioiq",
"openalex_key": "neiocnsdijnca",
"keywords": [
"Breast Cancer",
"Omics",
"cancer AND immunology"
],
"authors": [
"Kaelin, William",
"Doudna, Jennifer"
],
"journals": [
"ArXiv",
"bioRxiv",
"Nature"
],
"highlight_authors": [
"Doe, John"
]
}Where:
keywordscontains all the meaningful keywords you'd like to search;authorscontains your authors of interestjournalscontains all the journals you want to look at;highlight_authorscontains authors that LiRA will not actively search, but if any resulting publication contains one of these authors, its name will be highlighted in the output. (Note: I created this feature because I needed it, but I expect a few people will find it helpful. If that's your case, leave this field empty.).
Indicate the date from which you'd like to start searching:
pipenv run python3 src/lira.py --from-date 2023/09/19
# OR
pipenv run python3 src/lira.py -d 2023/09/19If everything works smoothly, you should see logging messages appearing. It means the LiRA is searching the papers from the given date to the day you are running LiRA.
At the end of the process, an HTML report should automatically open in your browser.
By default, the HTML report contains the following sections:
- General results: containing the publications matching the keywords specified in the
config.jsonfile from the given date; - Results for [Journal]: containing ALL the publications in the journal from the given date. A new section is
generated for each journal in the
config.json - Results for Authors: containing ALL the publications for the authors specified in the
config.jsonfrom the given date
Moreover, each engine will have its own section. For instance, if you choose to scrape from Google Scholar and Pubmed, you might end up with two "General parts", one for Google Scholar and one for Pubmed.
You can also indicate the number of weeks to scrape. For instance, to get the papers from the last 2 weeks, instead of indicating the initial date you can type:
pipenv run python3 src/lira.py --for-weeks 2
# OR
pipenv run python3 src/lira.py -w 2By default, LiRA will provide ALL the most recent publications from
your journals and authors of interest. If you want to get only
the results matching the keywords in the config file, just type:
pipenv run python3 src/lira.py -d 2023/09/19 --filter-journals # for journals
# OR
pipenv run python3 src/lira.py -d 2023/09/19 --filter-authors # for authors
# OR
pipenv run python3 src/lira.py -d 2023/09/19 --filter-authors --filter-journals # for bothIf you are just interested in having the most recent results from the journals and the authors, you can tell LiRA to not generate the 'General Results' section:
pipenv run python3 src/lira.py -d 2023/09/19 --suppress-generalIf you are not interested in any specific author or journal, it is sufficient to keep the authors and journals
sections empty in the config.json file:
{
"engines": ["pubmed", "google_scholar", "openalex"],
"email": "example@gmail.com",
"serpapi_key": "aobosandi392309qwjadosnasioiq",
"openalex_key": "neiocnsdijnca"
"keywords": [
"Breast Cancer",
"Omics",
"cancer AND immunology"
],
"authors": [],
"journals": [],
"highlight_authors": []
}By default LiRA scrapes all the publications to the day at which the program is executed. To put a final date in the search, use the flag --to_date (or -td)
pipenv run python3 src/lira.py -d 2025/11/01 -td 2025/12/01Warning: Not implemented for Google Scholar (SerpAPI)
For each section, the total number of results and the time range of the search is shown in the header:
The config.json file contains the default configuration for LiRA. However, you can generate different configuration
files and use them to run other searches. Just use the -c argument:
pipenv run python3 src/lira.py -d 2023/09/19 -c another_config_file.jsonThe output of LIRA is stored in out/lira_output.html. You can open the most recent report anytime by accessing the
file directly or running:
pipenv run python3 src/lira.py --last
# OR
pipenv run python3 src/lira.py -LFor additional details on the use of LiRA, just type:
pipenv run python3 src/lira.py --helpTo access all the possible arguments and ways to automate your literature research!
- 09/10/2023: First release.
- 22/11/2023: Added experimental support to Google Scholar with SerpAPI.
- 01/01/2026: Added
--to_dateflag and moved script insrc - 10/03/2026: Added experimental support to OpenAlex
- Automatic generation of Bibtex files
- PyMed is not always able to scrape the full abstract of a paper. You might find incomplete abstracts in the HTML report.
- I could not find a way to search for a specific author code on PubMed. Thus, the author names are searched "as they are", and you might find papers from homonyms. To avoid homonyms, it is recommended to filter the authors using the keywords.
- Usually, the authors' names in the
configfile are highlighted in red in the HTML report. However, LiRA might fail sometimes due to unprecise character matching or middle names. - SerpAPI is not able to retrive the full paper abastract, but only a "snippet". Thus, only a small part of the astract is displayed in LiRA.
Franco Pradelli (franco.pradelli94@gmail.com)
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Thanks if you're considering to contribute to LiRA. I follow the standard "Fork & Pull" contributing model of GitHub (more information here).
Briefly, if you'd like to make a change to LiRA you can:
- create an account on GitHub
- fork this project
- make a local clone with
git clone - make changes on the local copy
- commit changes git commit -a -m "my message"
- push to your GitHub account with
git push origin - create a Pull Request (PR) from your GitHub fork (go to your fork's webpage and click on "Pull Request." You can then add a message to describe your proposal.)


