GeneGist 🧬

Automated AI Pipeline for Scientific Discovery & Science Communication

GeneGist is an open-source tool designed for bioinformaticians and science communicators. It automates the tracking of scientific literature by fetching real-time data from NCBI PubMed, summarizing complex papers into engaging blog posts using Google Gemini AI, and archiving them in a cloud database.

🚀 Key Features

Smart Mining: Fetches latest research based on dynamic keywords (e.g., CRISPR, mRNA, Aging) using NCBI Entrez API.
AI-Powered Summarization: Uses a custom "Expert Science Journalist" persona to convert abstract scientific texts into high-quality, readable blog posts.
Dual Language Support: Generates content in both English (EN) and Turkish (TR) simultaneously.
Cloud Architecture: Stores all processed metadata and blog content in Supabase (PostgreSQL), ready for web integration.
Smart Deduplication: Automatically checks the database history to prevent processing the same article twice.
Hybrid Operation Modes:
- Auto Mode: Scans pre-defined topics daily (ideal for Cron jobs).
- Manual Mode: CLI support for specific, ad-hoc research queries.

🛠️ Tech Stack

Core Logic: Python (Modular Architecture)
Data Source: Biopython (NCBI Entrez)
LLM Engine: Google Gemini 1.5 Flash via google-generativeai
Database: Supabase (PostgreSQL) via supabase-py
Environment: Dotenv for secure key management

📦 Installation

Clone the repository:

git clone [https://github.com/yourusername/genegist.git](https://github.com/yourusername/genegist.git)
cd genegist

Set up Virtual Environment & Install Dependencies:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

Configuration: Create a .env file in the root directory with your API keys:

NCBI_API_KEY=your_ncbi_key
NCBI_EMAIL=your_email@example.com
GOOGLE_API_KEY=your_gemini_key
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key

Database Setup: Run the following SQL in your Supabase SQL Editor to create the necessary table:

create table articles (
  article_id text primary key,
  title text,
  topic text,
  url text,
  published_at date,
  content_tr text,
  content_en text,
  created_at timestamp with time zone default timezone('utc'::text, now())
);

📖 Usage

GeneGist is designed to be flexible. You can run it automatically based on your config file or manually via CLI.

1. Auto Mode (Default) Scans keywords defined in config.json.

python main.py

2. Manual Mode (CLI) Search for a specific topic instantly.

# Search for "Neuralink" papers from the last 30 days
python main.py --manual --keyword "Neuralink" --days 30 --count 5

For detailed documentation, please refer to MANUAL.md.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
MANUAL.md		MANUAL.md
README.md		README.md
config.json		config.json
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneGist 🧬

🚀 Key Features

🛠️ Tech Stack

📦 Installation

📖 Usage

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GeneGist 🧬

🚀 Key Features

🛠️ Tech Stack

📦 Installation

📖 Usage

🤝 Contributing

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages