Skip to content

Gilugali/Ben-Franklin

Repository files navigation

PhilAIsion

An AI-powered civic agent that lets Philadelphia residents access city services by just talking — through a phone call, a chat, or a $50 Raspberry Pi kiosk. Built for Drexel CodeFest 2026.


The Problem

Right now in Philadelphia, getting help from the city means navigating dozens of websites, PDFs, and forms — often in perfect English. And access isn't equal:

  • 84% of households have high-speed internet
  • 67% of older adults and Spanish-speaking residents do
  • 71% of low-income households do

That's tens of thousands of people effectively locked out of services that already exist.

What PhilAIsion Does

Walk up, call, or open a browser. Say something like:

"My trash hasn't been picked up in two weeks."

PhilAIsion's AI agent will:

  1. Understand the situation in any of 10 languages
  2. Search Philadelphia's 700+ city services and 122 government forms
  3. Take action — file the 311 report against the live PublicStuff API, draft the legal complaint, email the legal aid org, or check benefits eligibility
  4. Speak back with natural-sounding voice (ElevenLabs) so literacy isn't a barrier

It works without broadband, without a smartphone, and in any language. Instead of telling residents what to do — it does it for them.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                          USER ENTRY POINTS                        │
├──────────────────────────────────────────────────────────────────┤
│   Phone (Vapi)    Web App (React)    Raspberry Pi Kiosk          │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│              FRONTEND  ·  React 18 + Vite + i18next               │
│  ─────────────────────────────────────────────────────────────   │
│  • Web Speech API (STT — 10 languages)                           │
│  • ElevenLabs TTS via /voice proxy (with browser fallback)       │
│  • Supabase Auth (Google OAuth + anonymous)                      │
│  • SSE streaming chat with tool-call cards                       │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│           BACKEND  ·  Node + Express + TypeScript + Prisma        │
│  ─────────────────────────────────────────────────────────────   │
│                                                                   │
│   ┌──────────────────────────────────────────────────────────┐   │
│   │           AI AGENT  (OpenAI GPT-4o + tool use)           │   │
│   │  ────────────────────────────────────────────────────    │   │
│   │   14 callable tools — file_city_report, draft_document,  │   │
│   │   find_legal_help, immigration_screening, check_benefits │   │
│   │   web_lookup, escalate_emergency, …                      │   │
│   └──────────────────────────────────────────────────────────┘   │
│                                                                   │
│   ┌──────────────────────────────────────────────────────────┐   │
│   │  RAG  ·  text-embedding-3-small + cosine similarity      │   │
│   │  ────────────────────────────────────────────────────    │   │
│   │   122 phila.gov forms + 712 services indexed in-memory   │   │
│   │   Top-4 semantic match for "I need to…" queries          │   │
│   └──────────────────────────────────────────────────────────┘   │
│                                                                   │
│   ┌──────────────────────────────────────────────────────────┐   │
│   │              EXTERNAL INTEGRATIONS                        │   │
│   │  ────────────────────────────────────────────────────    │   │
│   │   • PublicStuff API (live 311 filing)                    │   │
│   │   • CARTO public_cases_fc (duplicate-report detection)   │   │
│   │   • Gmail API (legal-aid intake email)                   │   │
│   │   • OpenAI Whisper (optional server-side STT)            │   │
│   │   • ElevenLabs (TTS proxy)                               │   │
│   └──────────────────────────────────────────────────────────┘   │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│         DATA  ·  Supabase Postgres + Prisma  +  Static JSON       │
│  ─────────────────────────────────────────────────────────────   │
│   User · Case · CaseAction      orgs.json · laws.json            │
│   philaDocumentsCatalog (700+)   philaServicesCatalog (712+)     │
└──────────────────────────────────────────────────────────────────┘

Tech Stack

Frontend (/frontend)

Category Library Version
Framework React 18.3.1
Build tool Vite 5.3.1
Routing react-router-dom 6.30.3
Auth @supabase/supabase-js 2.103.0
HTTP axios 1.7.2
Animation framer-motion 12.38.0
Icons lucide-react 1.8.0
Markdown react-markdown 10.1.0
i18n i18next + react-i18next 26.0.4 / 17.0.2
Voice (STT) Web Speech API (native)

Backend (/backend)

Category Library Version
Runtime Node.js + Express 4.19.0
Language TypeScript 6.0.2 (ES2020, strict)
ORM Prisma 7.7.0 (PostgreSQL adapter)
LLM OpenAI SDK 6.34.0
Anthropic @anthropic-ai/sdk 0.39.0
Embeddings OpenAI text-embedding-3-small
Voice (TTS) ElevenLabs eleven_turbo_v2_5
Voice (STT) OpenAI whisper-1 (optional)
Web scraping cheerio 1.2.0
HTTP axios 1.7.0

Infrastructure

Layer Service
Database Supabase Postgres (Prisma-managed schema)
Auth Supabase Auth — Google OAuth + anonymous sessions
Email Gmail API (OAuth2 refresh token)
Hosting Vercel (frontend) + Render/Fly.io (backend)
Kiosk Raspberry Pi 3 + Raspberry Pi OS Bookworm + Chromium kiosk mode

The AI Agent — 14 Tools

The agent uses OpenAI GPT-4o with structured tool calls. Every call is logged to CaseAction so users (and the city) can audit what was done on their behalf.

Tool What it does
file_city_report Submits 311 reports to PublicStuff (POTHOLE, ILLEGAL-DUMPING, GRAFFITI, NOISE, SANITATION, HOUSING-VIOLATION, etc.)
send_org_email Sends an intake email to a legal aid / social services org via Gmail API
draft_document Generates a legal complaint, demand letter, or formal notice
find_resources Matches user need (food, shelter, health, legal) to a curated set of ~50 Philadelphia nonprofits
explain_rights Cites relevant PA / Philadelphia tenant, housing, immigration, civic law from laws.json
check_benefits Eligibility screen against 20+ benefits (SNAP, Medicaid, LIHEAP, WIC, …) by household factors
find_legal_help Routes to legal-aid org by domain + auto-sends intake email
immigration_screening Maps user situation to pathway (asylum / DACA / TPS / U-visa) + provider lookup
find_jobs_training Workforce-development program matching
civic_participation Voter registration, expungement, elected official lookup
search_city_services Semantic search over 712+ phila.gov services
web_lookup Fetches and extracts info from a specific URL (Cheerio)
web_search Generic web search fallback
escalate_emergency Hot-routes no-heat / homelessness / DV / medical / utility-shutoff to the right hotline

RAG — How Form & Service Matching Works

When a user asks "How do I appeal my property taxes?", we don't pattern-match keywords — we use semantic vector search.

  1. Indexing (one-time, ~$0.01) — On boot, the backend embeds all 122 forms in philaDocumentsCatalog and 712 services in philaServicesCatalog using OpenAI text-embedding-3-small. Vectors are cached in-memory.
  2. Query — User text is embedded on the fly.
  3. Cosine similarity — Top-4 matches above the 0.35 threshold are returned. Below threshold → "no good match, let me ask differently."
  4. Inline cards — Frontend renders matches as clickable cards under the assistant's reply.

Endpoint: GET /chat/find-forms?q=<query>&limit=4

This is the same mechanism used by search_city_services from inside the agent loop, so the LLM can pull relevant forms mid-conversation without hard-coding categories.


Voice Pipeline

Speech-to-Text

  • Web Speech API in the browser — free, ~10 languages supported by Chrome
  • Optional fallback: POST /voice/transcribe with Whisper for languages Chrome doesn't cover

Text-to-Speech

  • Backend proxies POST /voice/tts → ElevenLabs eleven_turbo_v2_5
    • Default voice: Sarah (EXAVITQu4vr4xnSDxMaL)
    • Settings: stability 0.45, similarity_boost 0.75, style 0.15
    • Max 1200 chars per request — long replies are split into sentences and queued
  • Audio cached as Blob URLs per session (no re-billing for repeats)
  • Browser speechSynthesis fallback if /voice/health fails — chunked at sentence boundaries to avoid Chrome's known bug where utterances longer than ~200 chars silently restart

Streaming

  • Chat replies stream as Server-Sent Events: token, tool_start, tool_done, done, error
  • TTS hook buffers tokens into sentences and queues each sentence for ElevenLabs as soon as a ., !, or ? arrives — so the user starts hearing the first sentence while the LLM is still writing the third

Database Schema

model User {
  id                 String   @id @default(cuid())
  createdAt          DateTime @default(now())
  updatedAt          DateTime @updatedAt
  name               String?
  email              String?  @unique
  phone              String?
  preferredLanguage  String?  @default("en")
  neighborhood       String?
  provider           String?  // 'google' | 'anonymous'
  profile            Json?    // {address, demographics, household, ...}
  cases              Case[]
}

model Case {
  id          String        @id @default(cuid())
  userId      String?       // nullable — supports anonymous kiosk users
  user        User?         @relation(fields: [userId], references: [id])
  transcript  String        // full conversation as text
  summary     Json          // structured summary {issue, location, urgency}
  domains     String[]      // ['housing', 'legal', '311']
  severity    String        // 'low' | 'medium' | 'high'
  status      String        // 'open' | 'in_progress' | 'resolved' | 'rejected'
  language    String?
  actions     CaseAction[]
  createdAt   DateTime      @default(now())
}

model CaseAction {
  id       String   @id @default(cuid())
  caseId   String
  case     Case     @relation(fields: [caseId], references: [id])
  payload  Json     // {tool, input, result} — every tool call is auditable
  createdAt DateTime @default(now())
}

API Surface (selected)

Method Path Purpose
POST /chat One-shot agent call, returns full reply + tool actions
POST /chat/stream SSE-streamed agent reply (tokens + tool events)
POST /chat/reset Clear server-side session state
GET /chat/find-forms?q=&limit= Semantic form match (RAG)
GET /chat/duplicate-check?service=&address=&radius=&days= CARTO check before filing 311
POST /voice/tts ElevenLabs proxy → audio/mpeg
GET /voice/health Probe ElevenLabs availability
POST /voice/transcribe Whisper fallback STT
GET /documents List 122 phila.gov forms
GET /documents/:id One form's metadata + PDF URL
POST /documents/:id/submit Route by submitPath (open311 / email / mail / online / in-person)
GET /services Browse 712+ city services
GET /forms/:id Form field schema for AutoFilloutForm

Full contract: see BACKEND_API_CONTRACT.md.


Languages (10)

English · Spanish · Simplified Chinese · Traditional Chinese · Russian · Vietnamese · Arabic (RTL) · Haitian Creole · Brazilian Portuguese · French

All UI strings live in /frontend/src/i18n/locales/. The agent reasons in English internally but replies in the user's selected language.


The Kiosk

A $50 Raspberry Pi 3 running:

  • Raspberry Pi OS Bookworm
  • Chromium in --kiosk mode (no chrome / tabs / nav bar)
  • Auto-loads https://<host>/?kiosk=1 — flips the React app into kiosk mode (hides tab bar, shows large-touch UI)
  • 90-second idle timer auto-resets to home for the next visitor
  • unclutter hides the cursor after 2s
  • Wi-Fi power-save disabled, screen blanking disabled

Install script: kiosk/install-kiosk.sh (idempotent — safe to re-run).


Local Development

Prerequisites

  • Node 20+
  • A Supabase project (Postgres + Auth)
  • API keys: OpenAI, ElevenLabs, Gmail OAuth refresh token (optional)

Backend

cd backend
npm install
cp .env.example .env   # fill in keys
npx prisma migrate deploy
npm run dev            # → http://localhost:3000

Frontend

cd frontend
npm install
cp .env.example .env   # set VITE_API_URL + VITE_SUPABASE_*
npm run dev            # → http://localhost:5173

Environment variables

backend/.env

PORT=3000
OPENAI_API_KEY=sk-…
ELEVENLABS_API_KEY=sk_…
ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_ANON_KEY=…
SUPABASE_SERVICE_ROLE_KEY=…
DATABASE_URL=postgresql://…
DIRECT_URL=postgresql://…
GMAIL_CLIENT_ID=…
GMAIL_CLIENT_SECRET=…

frontend/.env

VITE_API_URL=http://localhost:3000
VITE_SUPABASE_URL=https://<project>.supabase.co
VITE_SUPABASE_ANON_KEY=…

Project Structure

Ben-Franklin/
├── backend/
│   ├── src/
│   │   ├── modules/
│   │   │   ├── chat/         # agent orchestration + SSE streaming
│   │   │   ├── voice/        # /voice/tts + /voice/transcribe
│   │   │   ├── documents/    # phila.gov forms catalog + PDF resolution
│   │   │   ├── services/     # phila.gov services catalog
│   │   │   ├── forms/        # form field schemas + prefill
│   │   │   ├── auth/         # Supabase OAuth bridge
│   │   │   ├── email/        # Gmail intake mailer
│   │   │   └── brt-appeal/   # property tax appeal flow
│   │   ├── services/
│   │   │   ├── agent.ts      # the 14-tool LLM loop
│   │   │   └── embeddings.ts # in-memory RAG
│   │   └── data/
│   │       ├── philaDocumentsCatalog.ts   # 700+ docs
│   │       ├── philaServicesCatalog.ts    # 712+ services
│   │       ├── orgs.json                  # ~50 nonprofits
│   │       └── laws.json                  # PA / Philly law citations
│   └── prisma/schema.prisma
│
├── frontend/
│   ├── src/
│   │   ├── pages/            # Landing, Home, BenChat, Kiosk, Documents, …
│   │   ├── components/       # VoiceInput, SpeakButton, ResponseCard, …
│   │   ├── hooks/            # useVoice, useSpeech, useSentenceTTS, useProfile
│   │   ├── services/         # api.js, auth.js
│   │   └── i18n/locales/     # 10 language packs
│   └── vite.config.js
│
├── kiosk/
│   ├── install-kiosk.sh      # idempotent Pi setup
│   └── README.md
│
└── BACKEND_API_CONTRACT.md   # full endpoint spec

Why this matters

We didn't build another chatbot. We built AI as an interface to government — accessible, actionable, and built for the people who need it most. Voice-first means it works for someone whose hands are full, whose English is shaky, or whose only "device" is a kiosk in a library.


Credits

Built at Drexel CodeFest 2026 by team Ben-Franklin.

Stack credits: Anthropic Claude · OpenAI GPT-4o · ElevenLabs · Supabase · phila.gov open data · PublicStuff 311 · CARTO · React · Prisma

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors