🌑 Umbra — Space Biology Knowledge Engine

A full-stack research intelligence platform that scrapes, indexes, and makes scientific literature on space biology searchable and conversational using AI.

🚀 NASA Space Apps Challenge 2025

Umbra was built for the NASA Space Apps Challenge 2025, held on October 4–5, 2025.

The Challenge: A Body in Motion

Enable a new era of human space exploration! NASA has been performing biology experiments in space for decades, generating a tremendous amount of information critical for preparing humans to revisit the Moon and explore Mars. Although this knowledge is publicly available, it is difficult for potential users to find information pertaining to their specific interests.

The objective: Build a functional web application leveraging AI, knowledge graphs, and other tools to summarize the 608 NASA bioscience publications and enable users to explore the impacts and results of the experiments they describe.

Our Solution

Umbra addresses this challenge by providing:

AI-powered conversational interface — Ask natural language questions about space biology research
Knowledge graph visualization — Visually explore relationships between papers, organisms, and biological processes
Automated data pipeline — A Python scraper that extracts structured knowledge from all 608 publications using Gemini AI
Smart search & filtering — Find papers by organism, experimental condition, space environment, and more
Targeted audience support — Useful for scientists, mission architects, and research managers alike

📖 Project Overview

Umbra is a multi-component, AI-powered research platform focused on space biology. It enables researchers, students, and enthusiasts to discover, explore, and converse with a curated knowledge base of space biology research papers.

The platform consists of three major components:

Component	Technology	Purpose
`umbra/`	Next.js 15 + Convex + WorkOS	Frontend web application & real-time database
`Urban.api/`	ASP.NET Core 8	REST API backend with authentication
`scraper/`	Python 3 + Gemini AI	Research paper crawler & data pipeline

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Umbra Platform                           │
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌────────────────┐  │
│  │   umbra/     │────▶│  Convex DB   │◀────│   scraper/     │  │
│  │  Next.js 15  │     │  (Realtime)  │     │  Python Bot    │  │
│  │  Frontend    │     └──────────────┘     │  + Gemini AI   │  │
│  └──────┬───────┘                          └────────────────┘  │
│         │                                                       │
│         ▼                                                       │
│  ┌──────────────┐                                               │
│  │  Urban.api/  │                                               │
│  │ ASP.NET Core │                                               │
│  │   REST API   │                                               │
│  └──────────────┘                                               │
└─────────────────────────────────────────────────────────────────┘

🖥️ Frontend — `umbra/`

The frontend is a modern, responsive web application that serves as the primary interface for users to interact with the knowledge base.

Features

AI Chat Interface — Conversational assistant powered by Google Gemini, specialized in answering questions about space biology research
Research Paper Browser — Browse and read indexed research papers
Knowledge Graph — Interactive D3.js visualization of relationships between papers, organisms, and biological processes
Text Editor — Rich text editing capabilities for note-taking and annotations
Authentication — Secure sign-in / sign-up via WorkOS AuthKit
Dark / Light Theme — System-aware theming with user preference persistence
Real-time Data — Live updates via Convex's reactive database

Tech Stack

Category	Technology
Framework	Next.js 15 with App Router
Runtime	React 19
Language	TypeScript 5
Backend-as-a-Service	Convex (real-time database + serverless functions)
Authentication	WorkOS AuthKit (`@workos-inc/authkit-nextjs`)
AI / LLM	Google Gemini API (`@google/generative-ai`)
Styling	Tailwind CSS v4 + `tw-animate-css`
UI Components	Radix UI primitives (Accordion, Dialog, Dropdown, Select, Tabs, Tooltip, etc.)
Icons	Lucide React + React Icons
Data Visualization	D3.js v7
Animations	Motion (Framer Motion successor) + React Scroll Parallax
Markdown	`react-markdown`, `marked`, `remark-gfm`, `shiki` (syntax highlighting)
Carousel	Embla Carousel
CSV Parsing	PapaParse
Resizable Panels	`react-resizable-panels`
Notifications	Sonner (toast library)
Code Quality	ESLint + Prettier

Key Pages & Routes

Route	Description
`/`	Landing page / home
`/chat`	AI-powered research assistant chat
`/graph`	Knowledge graph visualization
`/researches`	Research paper browser
`/text-editor`	Annotation & text editor
`/server`	Server-side data page
`/sign-in`	Authentication — sign in
`/sign-up`	Authentication — sign up
`/callback`	WorkOS OAuth callback handler

Running the Frontend

cd umbra

# Install dependencies
npm install

# Set up environment variables
cp .env.local.example .env.local
# Fill in: CONVEX_URL, WORKOS_CLIENT_ID, WORKOS_API_KEY, GEMINI_API_KEY

# Start both frontend and Convex backend
npm run dev

🔧 Backend API — `Urban.api/`

A clean, layered REST API following the BLL/DAL (Business Logic Layer / Data Access Layer) architectural pattern.

Features

JWT Authentication — Secure token-based authentication with configurable issuer/audience
Password Hashing — BCrypt-based password security
RESTful Controllers — Clean HTTP endpoints exposed via ASP.NET controllers
Swagger / OpenAPI — Auto-generated API documentation at /swagger
Layered Architecture — Strict separation into DAL (database queries) and BLL (business rules)

Tech Stack

Category	Technology
Framework	ASP.NET Core 8 Web API
Language	C# (.NET 8)
ORM	Entity Framework Core 8
Database	Microsoft SQL Server
Authentication	JWT Bearer Tokens (`Microsoft.AspNetCore.Authentication.JwtBearer`)
Password Security	`BCrypt.Net-Next`
API Documentation	Swashbuckle / Swagger (`Swashbuckle.AspNetCore` 6.6.2)
Architecture	3-tier: API → BLL → DAL

Project Structure

Urban.api/
├── Urban.api/        # ASP.NET Core Web API (controllers, startup, config)
│   └── Controllers/  # HTTP endpoints
├── BLL/              # Business Logic Layer (services, domain rules)
│   └── Services/
└── DAL/              # Data Access Layer (EF Core contexts, models, migrations)

Running the Backend

cd Urban.api

# Restore NuGet packages
dotnet restore

# Update appsettings.json with your SQL Server connection string
# Run database migrations
dotnet ef database update --project DAL --startup-project Urban.api

# Start the API
dotnet run --project Urban.api
# API available at https://localhost:{port}/swagger

🤖 Data Scraper — `scraper/`

An asynchronous Python automation bot that crawls scientific publications, extracts structured data using AI, generates vector embeddings, and populates the Convex database.

Features

Automated Paper Crawling — Reads a CSV of paper URLs and processes them in bulk
HTML Parsing — BeautifulSoup-based extraction of title, authors, abstract, methods, results, discussion, conclusions, DOI, keywords, and citation counts
AI Entity Extraction — Uses Gemini 2.0 Flash to identify:
- 🧬 Organisms (species, microorganisms, cell types)
- 🔬 Experimental Conditions (temperature, pressure, microgravity, radiation)
- 🌱 Biological Processes (cellular processes, molecular pathways)
- 🚀 Space Environments (ISS, cosmic radiation, mission contexts)
Vector Embeddings — Generates semantic embeddings for similarity search
Progress Tracking — Resumable processing with progress.json (survives interruptions)
Rate Limiting — Respects Google Gemini API rate limits with exponential backoff
Convex Integration — Directly populates the Convex real-time database

Tech Stack

Category	Technology
Language	Python 3 (async/await with `asyncio`)
AI / LLM	Google Gemini 2.0 Flash (`google-generativeai`)
HTTP Client	`httpx` (async) + `requests` (sync)
HTML Parsing	`BeautifulSoup4` + `lxml`
Data Processing	`pandas`
Database	Convex (`convex` Python SDK)
Configuration	`python-dotenv`
Progress UI	`tqdm`

Running the Scraper

cd scraper

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.local .env
# Fill in: CONVEX_URL, CONVEX_DEPLOY_KEY, GEMINI_API_KEY, INPUT_CSV_PATH

# Prepare your sources CSV (columns: title, link)
# Then run the bot
python main.py

The scraper is resumable — if interrupted, it picks up from the last successfully processed row using progress.json.

🔑 Environment Variables

`umbra/.env.local`

CONVEX_DEPLOYMENT=...
NEXT_PUBLIC_CONVEX_URL=...
WORKOS_CLIENT_ID=...
WORKOS_API_KEY=...
NEXT_PUBLIC_WORKOS_REDIRECT_URI=...
GEMINI_API_KEY=...

`scraper/.env`

CONVEX_URL=...
CONVEX_DEPLOY_KEY=...
GEMINI_API_KEY=...
INPUT_CSV_PATH=sources.csv

`Urban.api/appsettings.json`

{
  "ConnectionStrings": {
    "DefaultConnection": "Server=...;Database=...;..."
  },
  "JwtSettings": {
    "SecretKey": "...",
    "Issuer": "UrbanAPI",
    "Audience": "UrbanAPI"
  }
}

📂 Repository Structure

Umbra/
├── umbra/              # Next.js 15 frontend
│   ├── app/            # App Router pages & API routes
│   ├── components/     # Reusable UI components
│   ├── convex/         # Convex serverless functions & schema
│   ├── hooks/          # Custom React hooks
│   └── lib/            # Shared utilities
│
├── Urban.api/          # ASP.NET Core 8 REST API
│   ├── Urban.api/      # Web API project (controllers, program)
│   ├── BLL/            # Business Logic Layer
│   └── DAL/            # Data Access Layer (EF Core)
│
└── scraper/            # Python data pipeline
    ├── main.py         # Entry point
    ├── extraction.py   # Web scraping + Gemini entity extraction
    ├── database.py     # Convex database operations
    ├── embedding_generator.py  # Vector embedding generation
    ├── rate_limiter.py # API rate limiting logic
    ├── progress_tracker.py     # Resumable processing tracker
    └── config.py       # Configuration dataclasses

🛠️ Technology Summary

Category	Technologies
Frontend	Next.js 15, React 19, TypeScript 5, Tailwind CSS v4
Real-time DB	Convex
Authentication	WorkOS AuthKit, JWT
AI / LLM	Google Gemini API (Gemini 2.0 Flash)
UI Library	Radix UI, Lucide React, React Icons
Data Viz	D3.js v7
Animation	Motion (Framer Motion), React Scroll Parallax
Backend	ASP.NET Core 8, C#, .NET 8
ORM	Entity Framework Core 8
Database	Microsoft SQL Server
Security	JWT Bearer, BCrypt
API Docs	Swagger / OpenAPI (Swashbuckle)
Scraping	Python, BeautifulSoup4, httpx, lxml
Data Pipeline	pandas, asyncio, tqdm
Code Quality	ESLint, Prettier, TypeScript strict mode

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Urban.api		Urban.api
scraper		scraper
umbra		umbra
LICENSE		LICENSE
README.md		README.md
nasa.mp4		nasa.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌑 Umbra — Space Biology Knowledge Engine

🚀 NASA Space Apps Challenge 2025

The Challenge: A Body in Motion

Our Solution

📖 Project Overview

🏗️ Architecture

🖥️ Frontend — `umbra/`

Features

Tech Stack

Key Pages & Routes

Running the Frontend

🔧 Backend API — `Urban.api/`

Features

Tech Stack

Project Structure

Running the Backend

🤖 Data Scraper — `scraper/`

Features

Tech Stack

Running the Scraper

🔑 Environment Variables

`umbra/.env.local`

`scraper/.env`

`Urban.api/appsettings.json`

📂 Repository Structure

🛠️ Technology Summary

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌑 Umbra — Space Biology Knowledge Engine

🚀 NASA Space Apps Challenge 2025

The Challenge: A Body in Motion

Our Solution

📖 Project Overview

🏗️ Architecture

🖥️ Frontend — umbra/

Features

Tech Stack

Key Pages & Routes

Running the Frontend

🔧 Backend API — Urban.api/

Features

Tech Stack

Project Structure

Running the Backend

🤖 Data Scraper — scraper/

Features

Tech Stack

Running the Scraper

🔑 Environment Variables

umbra/.env.local

scraper/.env

Urban.api/appsettings.json

📂 Repository Structure

🛠️ Technology Summary

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🖥️ Frontend — `umbra/`

🔧 Backend API — `Urban.api/`

🤖 Data Scraper — `scraper/`

`umbra/.env.local`

`scraper/.env`

`Urban.api/appsettings.json`

Packages