Skip to content

widjis/gpt-realtimeprj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GPT Realtime Project (MARISA)

A real-time conversational AI application with advanced memory management and multimodal capabilities.

πŸš€ Features

Core Functionality

  • Real-time Voice Conversation: Seamless voice interaction using OpenAI's Realtime API
  • 5-Context Memory System: Intelligent conversation context management with persistent storage
  • MCP Integration: Model Context Protocol support for extended functionality
  • Responsive UI: Modern React interface with TailwindCSS styling
  • πŸ†• Multimodal Screen Sharing: Real-time screen capture and visual analysis capabilities

Advanced Capabilities

  • Cross-session Memory: Conversations persist across browser sessions
  • Context Window Management: Automatic handling of conversation context limits
  • Environment Configuration: Secure credential management
  • Real-time Updates: Live conversation state synchronization
  • πŸ†• Visual Intelligence: AI can see and analyze your screen content in real-time
  • πŸ†• Screen Capture Integration: Automatic periodic screen captures with conversation context

πŸ—οΈ Architecture

Frontend (/web)

  • Framework: React 18 + TypeScript + Vite
  • Styling: TailwindCSS with Shadcn UI components
  • State Management: React hooks with localStorage persistence
  • Real-time Communication: WebSocket integration

Backend (/server)

  • Runtime: Node.js + Express
  • API Integration: OpenAI Realtime API
  • MCP Support: Model Context Protocol server integration
  • Environment: Secure configuration management

πŸ“‹ Prerequisites

  • Node.js 18+ and npm
  • OpenAI API key with Realtime API access
  • Git for version control

πŸ› οΈ Installation

1. Clone the Repository

git clone https://github.com/widjis/gpt-realtimeprj.git
cd gpt-realtimeprj

2. Backend Setup

cd server
npm install

# Configure environment variables
cp .env.example .env
# Edit .env with your OpenAI API key and MCP server URL

3. Frontend Setup

cd ../web
npm install

βš™οΈ Configuration

Create server/.env file with:

OPENAI_API_KEY=your_openai_api_key_here
PORT=3001
MCP_SERVER_URL=https://your-mcp-server.com/mcp/server
NODE_TLS_REJECT_UNAUTHORIZED=0

πŸš€ Running the Application

Development Mode

  1. Start the Backend Server:
cd server
node index.js
  1. Start the Frontend Development Server:
cd web
npm run dev
  1. Access the Application:

🎯 Usage

  1. Start Conversation: Click the microphone button to begin voice interaction
  2. Memory Management: The system automatically manages conversation context
  3. Session Persistence: Your conversation history is saved across browser sessions
  4. Real-time Responses: Experience seamless voice-to-voice communication

πŸ“ Project Structure

gpt-realtimeprj/
β”œβ”€β”€ docs/
β”‚   └── journal.md          # Development journal and changelog
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ .env.example        # Environment configuration template
β”‚   β”œβ”€β”€ index.js           # Main server application
β”‚   β”œβ”€β”€ package.json       # Backend dependencies
β”‚   └── package-lock.json
β”œβ”€β”€ web/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx        # Main React application
β”‚   β”‚   β”œβ”€β”€ App.css        # Application styles
β”‚   β”‚   └── main.tsx       # Application entry point
β”‚   β”œβ”€β”€ package.json       # Frontend dependencies
β”‚   β”œβ”€β”€ tailwind.config.js # TailwindCSS configuration
β”‚   └── vite.config.ts     # Vite build configuration
β”œβ”€β”€ .gitignore             # Git ignore rules
└── README.md              # This file

πŸ”§ Development

Key Components

  • Memory System: 5-context conversation management with localStorage
  • WebSocket Handler: Real-time communication with OpenAI API
  • MCP Integration: Extended functionality through Model Context Protocol
  • UI Components: Modern React components with TailwindCSS

Environment Variables

Variable Description Required
OPENAI_API_KEY OpenAI API key for Realtime API Yes
PORT Backend server port No (default: 3001)
MCP_SERVER_URL MCP server endpoint Yes
NODE_TLS_REJECT_UNAUTHORIZED TLS certificate validation No

🎯 Multimodal Features

Screen Sharing Capabilities

  • Real-time Screen Capture: Share your screen with MARISA using browser APIs
  • Screen Data Transmission: Screen captures are sent as base64 data
  • Automatic Capture: Screenshots taken every 5 seconds during active sharing
  • Live Preview: See the latest captured screen in the interface
  • Conversation Integration: Screen captures are tracked in conversation context

Current Limitations

  • Visual Analysis: OpenAI's Realtime API currently doesn't support vision modality
  • Text-based Processing: Screen captures are sent as base64 text data
  • Acknowledgment Only: MARISA can acknowledge receipt but cannot analyze visual content

How to Use Screen Sharing

  1. Connect: Establish connection with MARISA
  2. Start Sharing: Click the "πŸ“Ί Share Screen" button
  3. Grant Permission: Allow browser access to your screen
  4. Data Transmission: MARISA receives screen capture data (but cannot analyze visually)
  5. Stop Sharing: Click "πŸ›‘ Stop Sharing" when done

Technical Details

  • Uses getDisplayMedia() API for screen capture
  • Images compressed to JPEG format (70% quality) for efficiency
  • Base64 encoding for data transmission
  • Automatic cleanup when sharing ends
  • Screen sharing status tracked in conversation context
  • Text-based message format due to API constraints

🚧 Roadmap

  • Multimodal Screen Sharing: βœ… Implemented screen capture and vision capabilities
  • Advanced Context Management: Enhanced memory algorithms
  • User Authentication: Secure user sessions
  • Mobile App: React Native implementation
  • Cloud Deployment: Production deployment guides
  • File Upload Support: Direct image and document sharing
  • Screen Annotation: Drawing and markup tools for shared screens

πŸ“ Documentation

Detailed development logs and feature implementations are documented in docs/journal.md.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for the Realtime API
  • React and Vite communities
  • TailwindCSS for the styling framework
  • Model Context Protocol for extensibility

Built with ❀️ using React, Node.js, and OpenAI Realtime API

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors