Add Ollama-backed async article analysis (summary + keywords) by xenacode-art · Pull Request #157 · m2b3/SciCommons-backend

xenacode-art · 2026-03-16T09:47:49Z

Overview

This is a proof-of-concept for the AI-assisted literature discovery
feature discussed in the GSoC 2026 possibilities thread. It uses a
locally hosted Ollama model — no external API keys or costs required.

What it does

Given an article's abstract, the system:

Generates a 2–3 sentence plain-language summary
Extracts 5–8 key topic keywords as structured JSON

Both are returned as a Celery task result so the UI can poll
asynchronously without blocking the request.

New files

myapp/services/ai_tasks.py

analyse_article_task — @shared_task that calls the Ollama
/api/generate endpoint with two sequential prompts
Ollama base URL and model are configurable via OLLAMA_BASE_URL
and OLLAMA_MODEL Django settings (defaults: localhost:11434,
llama3.2)
ConnectionError and Timeout trigger Celery retries with
backoff; other failures propagate so they're visible in the
task state

articles/ai_api.py

POST /articles/{slug}/ai-summarize — validates article exists
and has an abstract, queues the task, returns task_id
GET /articles/ai-task/{task_id} — polls task state and returns
result when complete

Running locally

# Install and start Ollama
ollama pull llama3.2
ollama serve

# Start Celery worker (Redis must be running)
celery -A myapp worker -l info

Then hit POST /api/articles/{slug}/ai-summarize with a valid JWT.

armanalam03 · 2026-03-22T21:01:06Z

This is a great initiative @xenacode-art ! If possible, please attach some working examples (images/videos) of integrating it with frontend. I have few questions on this -

How and who sends this task to celery?
How do we generate summaries on existing articles? Does this run everytime on the api call for get article, or is it a scheduled job?
What hardware resources are needed to run this model on the cloud?
How much accurate is it to generate a summary if an article and fetching relevant keywords which best describes the article?
In future, we would like to build a recommendation system of the articles based on the user. How can we extend our AI workflow with this to fetch best relevant keywords for an article?

Community names containing special characters (e.g. '+', spaces) were being inserted raw into notification link paths, causing 404s when users clicked through. Apply urllib.parse.quote(..., safe='') to all in-app notification links that include the community name in the path. Fixes m2b3#119

Introduces two new components: - myapp/services/ai_tasks.py: a Celery shared_task that sends an article abstract to a locally running Ollama instance, retrieves a 2-3 sentence summary and a JSON array of keywords, and returns them as a structured result. The Ollama base URL and model are configurable via OLLAMA_BASE_URL and OLLAMA_MODEL settings so no external API keys are required. - articles/ai_api.py: two endpoints wired into the articles router: POST /{slug}/ai-summarize — queues the task, returns task_id GET /ai-task/{task_id} — polls task state and result Connection and timeout errors trigger automatic Celery retries so transient Ollama unavailability does not surface as hard failures.

xenacode-art · 2026-03-22T21:19:35Z

Thanks for the review @armanalam03 Happy to answer each of these.

How and who sends this task to Celery?

The POST /articles/{slug}/ai-summarize endpoint does — it calls analyse_article_task.delay(article.id, article.abstract) which queues the job. The frontend (or any authenticated client) hits that endpoint to trigger it. The result is then polled via GET /articles/ai-task/{task_id} using the task ID returned.

How do we generate summaries on existing articles? On every get-article call or scheduled?

Right now it's fully on-demand — nothing runs automatically. The next step I'd suggest is caching the result: add ai_summary and ai_keywords fields to the Article model so after the first run the result is stored and returned immediately on subsequent calls.
We could also wire a post-save signal to auto-queue new articles. A scheduled job for backfilling existing articles is also straightforward with Celery Beat.

What hardware resources are needed on the cloud?
llama3.2 (3B parameters) needs roughly 4–6 GB of RAM and runs fine on CPU, just slower. On cloud, a small GPU instance (e.g., AWS g4dn.xlarge) gives good throughput. For a budget option, the quantized llama3.2:1b variant cuts that in half. For production I'd recommend deploying Ollama on a dedicated instance and pointing OLLAMA_BASE_URL at it — the Django/Celery side doesn't need any GPU.
How accurate is the summarization and keyword extraction?
For well-written scientific abstracts, llama3.2 is quite solid — summaries are coherent and keywords are topically relevant. Accuracy drops on very domain-specific jargon (e.g. niche chemistry or genomics). We could improve this by switching to a larger model (llama3.1:8b) or by prompt-tuning for scientific text. I can add some example outputs on a few real SciCommons articles if that helps.
How to extend this for a recommendation system?
The keywords extracted per article are the natural starting point. Store them on the Article model, then GET /articles/{slug}/related can query Article.objects.filter(keywords__overlap=[...]) ranked by overlap count — no vector DB needed for MVP. When we want semantic similarity (not just keyword overlap), we'd generate embeddings using a lightweight model like nomic-embed-text via Ollama and store them in pgvector. That's a clean extension of this same Ollama infrastructure. I actually have this scoped out in my GSoC proposal.

xenacode-art changed the base branch from main to test March 22, 2026 20:38

xenacode-art added 4 commits March 22, 2026 22:06

Fix invalid %f datefmt in logging config causing errors on Windows

0f75fb0

Add async email notifications for community join request events

91300c1

xenacode-art force-pushed the feat/ollama-ai-article-analysis branch from 88b39cf to 95c6817 Compare March 22, 2026 21:07

Merge branch 'test' into feat/ollama-ai-article-analysis

c55a822

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ollama-backed async article analysis (summary + keywords)#157

Add Ollama-backed async article analysis (summary + keywords)#157
xenacode-art wants to merge 5 commits intom2b3:testfrom
xenacode-art:feat/ollama-ai-article-analysis

xenacode-art commented Mar 16, 2026

Uh oh!

armanalam03 commented Mar 22, 2026

Uh oh!

xenacode-art commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xenacode-art commented Mar 16, 2026

Overview

What it does

New files

Running locally

Uh oh!

armanalam03 commented Mar 22, 2026

Uh oh!

xenacode-art commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xenacode-art commented Mar 22, 2026 •

edited

Loading