A modular and extensible pipeline for automated retrieval, named entity recognition (NER), summarization, and question-answering from scientific papers using Pygetpapers, spaCy/transformers, and LLM/RAG-based models.
- Retrieve scientific papers from open-access sources using Pygetpapers
- Extract named entities using pre-trained spaCy or transformer-based models
- Summarize full texts or abstracts using transformer-based summarization models (e.g., BART, T5)
- Ask questions and get answers with RAG-based pipelines or custom LLMs