Skip to content

demirbase/GeneGist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GeneGist 🧬

Automated AI Pipeline for Scientific Discovery & Science Communication

Python Gemini AI Supabase License

GeneGist is an open-source tool designed for bioinformaticians and science communicators. It automates the tracking of scientific literature by fetching real-time data from NCBI PubMed, summarizing complex papers into engaging blog posts using Google Gemini AI, and archiving them in a cloud database.

🚀 Key Features

  • Smart Mining: Fetches latest research based on dynamic keywords (e.g., CRISPR, mRNA, Aging) using NCBI Entrez API.
  • AI-Powered Summarization: Uses a custom "Expert Science Journalist" persona to convert abstract scientific texts into high-quality, readable blog posts.
  • Dual Language Support: Generates content in both English (EN) and Turkish (TR) simultaneously.
  • Cloud Architecture: Stores all processed metadata and blog content in Supabase (PostgreSQL), ready for web integration.
  • Smart Deduplication: Automatically checks the database history to prevent processing the same article twice.
  • Hybrid Operation Modes:
    • Auto Mode: Scans pre-defined topics daily (ideal for Cron jobs).
    • Manual Mode: CLI support for specific, ad-hoc research queries.

🛠️ Tech Stack

  • Core Logic: Python (Modular Architecture)
  • Data Source: Biopython (NCBI Entrez)
  • LLM Engine: Google Gemini 1.5 Flash via google-generativeai
  • Database: Supabase (PostgreSQL) via supabase-py
  • Environment: Dotenv for secure key management

📦 Installation

  1. Clone the repository:

    git clone [https://github.com/yourusername/genegist.git](https://github.com/yourusername/genegist.git)
    cd genegist
  2. Set up Virtual Environment & Install Dependencies:

    python3 -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    pip install -r requirements.txt
  3. Configuration: Create a .env file in the root directory with your API keys:

    NCBI_API_KEY=your_ncbi_key
    NCBI_EMAIL=your_email@example.com
    GOOGLE_API_KEY=your_gemini_key
    SUPABASE_URL=your_supabase_url
    SUPABASE_KEY=your_supabase_anon_key
  4. Database Setup: Run the following SQL in your Supabase SQL Editor to create the necessary table:

    create table articles (
      article_id text primary key,
      title text,
      topic text,
      url text,
      published_at date,
      content_tr text,
      content_en text,
      created_at timestamp with time zone default timezone('utc'::text, now())
    );

📖 Usage

GeneGist is designed to be flexible. You can run it automatically based on your config file or manually via CLI.

1. Auto Mode (Default) Scans keywords defined in config.json.

python main.py

2. Manual Mode (CLI) Search for a specific topic instantly.

# Search for "Neuralink" papers from the last 30 days
python main.py --manual --keyword "Neuralink" --days 30 --count 5

For detailed documentation, please refer to MANUAL.md.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

This project is licensed under the MIT License.

About

🧬 Automated AI pipeline that monitors NCBI PubMed, generates bilingual (EN/TR) science blog posts using Google Gemini 1.5, and archives metadata in Supabase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages