A real-time conversational AI application with advanced memory management and multimodal capabilities.
- Real-time Voice Conversation: Seamless voice interaction using OpenAI's Realtime API
- 5-Context Memory System: Intelligent conversation context management with persistent storage
- MCP Integration: Model Context Protocol support for extended functionality
- Responsive UI: Modern React interface with TailwindCSS styling
- π Multimodal Screen Sharing: Real-time screen capture and visual analysis capabilities
- Cross-session Memory: Conversations persist across browser sessions
- Context Window Management: Automatic handling of conversation context limits
- Environment Configuration: Secure credential management
- Real-time Updates: Live conversation state synchronization
- π Visual Intelligence: AI can see and analyze your screen content in real-time
- π Screen Capture Integration: Automatic periodic screen captures with conversation context
- Framework: React 18 + TypeScript + Vite
- Styling: TailwindCSS with Shadcn UI components
- State Management: React hooks with localStorage persistence
- Real-time Communication: WebSocket integration
- Runtime: Node.js + Express
- API Integration: OpenAI Realtime API
- MCP Support: Model Context Protocol server integration
- Environment: Secure configuration management
- Node.js 18+ and npm
- OpenAI API key with Realtime API access
- Git for version control
git clone https://github.com/widjis/gpt-realtimeprj.git
cd gpt-realtimeprjcd server
npm install
# Configure environment variables
cp .env.example .env
# Edit .env with your OpenAI API key and MCP server URLcd ../web
npm installCreate server/.env file with:
OPENAI_API_KEY=your_openai_api_key_here
PORT=3001
MCP_SERVER_URL=https://your-mcp-server.com/mcp/server
NODE_TLS_REJECT_UNAUTHORIZED=0- Start the Backend Server:
cd server
node index.js- Start the Frontend Development Server:
cd web
npm run dev- Access the Application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:3001
- Start Conversation: Click the microphone button to begin voice interaction
- Memory Management: The system automatically manages conversation context
- Session Persistence: Your conversation history is saved across browser sessions
- Real-time Responses: Experience seamless voice-to-voice communication
gpt-realtimeprj/
βββ docs/
β βββ journal.md # Development journal and changelog
βββ server/
β βββ .env.example # Environment configuration template
β βββ index.js # Main server application
β βββ package.json # Backend dependencies
β βββ package-lock.json
βββ web/
β βββ src/
β β βββ App.tsx # Main React application
β β βββ App.css # Application styles
β β βββ main.tsx # Application entry point
β βββ package.json # Frontend dependencies
β βββ tailwind.config.js # TailwindCSS configuration
β βββ vite.config.ts # Vite build configuration
βββ .gitignore # Git ignore rules
βββ README.md # This file
- Memory System: 5-context conversation management with localStorage
- WebSocket Handler: Real-time communication with OpenAI API
- MCP Integration: Extended functionality through Model Context Protocol
- UI Components: Modern React components with TailwindCSS
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for Realtime API | Yes |
PORT |
Backend server port | No (default: 3001) |
MCP_SERVER_URL |
MCP server endpoint | Yes |
NODE_TLS_REJECT_UNAUTHORIZED |
TLS certificate validation | No |
- Real-time Screen Capture: Share your screen with MARISA using browser APIs
- Screen Data Transmission: Screen captures are sent as base64 data
- Automatic Capture: Screenshots taken every 5 seconds during active sharing
- Live Preview: See the latest captured screen in the interface
- Conversation Integration: Screen captures are tracked in conversation context
- Visual Analysis: OpenAI's Realtime API currently doesn't support vision modality
- Text-based Processing: Screen captures are sent as base64 text data
- Acknowledgment Only: MARISA can acknowledge receipt but cannot analyze visual content
- Connect: Establish connection with MARISA
- Start Sharing: Click the "πΊ Share Screen" button
- Grant Permission: Allow browser access to your screen
- Data Transmission: MARISA receives screen capture data (but cannot analyze visually)
- Stop Sharing: Click "π Stop Sharing" when done
- Uses
getDisplayMedia()API for screen capture - Images compressed to JPEG format (70% quality) for efficiency
- Base64 encoding for data transmission
- Automatic cleanup when sharing ends
- Screen sharing status tracked in conversation context
- Text-based message format due to API constraints
- Multimodal Screen Sharing: β Implemented screen capture and vision capabilities
- Advanced Context Management: Enhanced memory algorithms
- User Authentication: Secure user sessions
- Mobile App: React Native implementation
- Cloud Deployment: Production deployment guides
- File Upload Support: Direct image and document sharing
- Screen Annotation: Drawing and markup tools for shared screens
Detailed development logs and feature implementations are documented in docs/journal.md.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the Realtime API
- React and Vite communities
- TailwindCSS for the styling framework
- Model Context Protocol for extensibility
Built with β€οΈ using React, Node.js, and OpenAI Realtime API