Skip to content

Comments

feat: paper downloading pipeline#7

Merged
gtcha2 merged 15 commits intomainfrom
shlok/feat/paper-fetching
May 2, 2025
Merged

feat: paper downloading pipeline#7
gtcha2 merged 15 commits intomainfrom
shlok/feat/paper-fetching

Conversation

@shloknatarajan
Copy link
Member

Download XML PubMed Article from PMCIDs

Updates

  1. Downloads data at PharmGKB's variantAnnotations API ("https://api.pharmgkb.org/v1/download/file/data/variantAnnotations.zip")
  2. Get's all the unique PMIDs from the annotated data
  3. Converts them into PMCIDs (based on what's listed as available from NCBI's ID Converter API
  4. Downloads the available articles into the src/fetch_articles/saved_data/articles
  5. Updated to src structure and created many readme changes

Next Steps:

  1. Upload saved articles somewhere
  2. Create a benchmark for extracting annotations from files. This will involve coming up with metrics and creating an evaluation pipeline

@shloknatarajan shloknatarajan self-assigned this May 2, 2025
@gtcha2
Copy link
Collaborator

gtcha2 commented May 2, 2025

merging request, but will need to fulfill the reuse and the pylint before going public,

@gtcha2 gtcha2 merged commit 60de935 into main May 2, 2025
4 of 8 checks passed
@shloknatarajan shloknatarajan deleted the shlok/feat/paper-fetching branch May 23, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants