Skip to content

Latest commit

Β 

History

History
407 lines (344 loc) Β· 14.3 KB

File metadata and controls

407 lines (344 loc) Β· 14.3 KB

Voice-Guided E-Commerce UI System

Project Overview

A production-grade POC demonstrating voice-guided UI workflow for an e-commerce application.

Key Principle: Voice is an alternative input method, NOT a chatbot. It controls the same UI, calls the same APIs, and behaves exactly like typing or clicking.


Tech Stack

Layer Technology
Frontend React (JavaScript, no TypeScript)
Backend Python (FastAPI)
Voice Transport WebSocket (OpenAI Realtime API)
API Transport HTTP REST
Styling Tailwind CSS

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           FRONTEND (React)                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Home   β”‚  β”‚Products β”‚  β”‚ Profile β”‚  β”‚   VoiceController    β”‚   β”‚
β”‚  β”‚  Page   β”‚  β”‚  Page   β”‚  β”‚  Page   β”‚  β”‚   (Global, Sticky)   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                    β”‚                               β”‚               β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”                   β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”         β”‚
β”‚              β”‚  Filters  β”‚                   β”‚ WebSocket β”‚         β”‚
β”‚              β”‚  + Grid   β”‚                   β”‚  Client   β”‚         β”‚
β”‚              β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                    β”‚                               β”‚               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚ HTTP                          β”‚ WebSocket
                     β”‚ /api/products                 β”‚ /ws
                     β–Ό                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          BACKEND (FastAPI)                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   REST Endpoints   β”‚           β”‚    WebSocket Handler       β”‚   β”‚
β”‚  β”‚  GET /api/products β”‚           β”‚         /ws                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            β”‚                                  β”‚                     β”‚
β”‚            β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                     β”‚
β”‚            β”‚         β”‚                        β”‚                     β”‚
β”‚            β–Ό         β–Ό                        β–Ό                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚    search_products()    β”‚    β”‚     RealtimeClient          β”‚    β”‚
β”‚  β”‚   (Single Source API)   │◄───│  (OpenAI WebSocket)         β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚            β”‚                                   β”‚                    β”‚
β”‚            β–Ό                                   β”‚                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚                    β”‚
β”‚  β”‚     Product Data        β”‚                   β”‚                    β”‚
β”‚  β”‚   (In-memory/JSON)      β”‚                   β”‚                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                 β”‚
                                                 β–Ό
                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                    β”‚  OpenAI Realtime API   β”‚
                                    β”‚  (STT + LLM + TTS)     β”‚
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

Manual Interaction (Type/Click)

User clicks filter β†’ Frontend state updates β†’ HTTP call to /api/products β†’ UI renders

Voice Interaction

User speaks β†’ Audio β†’ WebSocket β†’ OpenAI transcribes β†’
OpenAI calls search_products function β†’ Backend executes β†’
Result sent to:
  1. OpenAI (for voice response generation)
  2. Frontend via WebSocket (ui_update event for UI rendering)
β†’ OpenAI speaks response + Frontend updates UI simultaneously

Critical Design Decision

Both flows end up calling the same search_products() function. Voice doesn't have special APIs.


Directory Structure

BitComm/
β”œβ”€β”€ PRD.md                    # This document
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI app, HTTP + WebSocket endpoints
β”‚   β”œβ”€β”€ realtime_client.py   # OpenAI Realtime API WebSocket client
β”‚   β”œβ”€β”€ tools.py             # Function definitions for OpenAI
β”‚   β”œβ”€β”€ products.py          # Product data + search_products function
β”‚   β”œβ”€β”€ requirements.txt     # Python dependencies
β”‚   └── .env                 # Environment variables (OPENAI_API_KEY)
└── frontend/
    β”œβ”€β”€ package.json
    β”œβ”€β”€ public/
    β”‚   └── index.html
    └── src/
        β”œβ”€β”€ index.js
        β”œβ”€β”€ App.js
        β”œβ”€β”€ pages/
        β”‚   β”œβ”€β”€ Home.js
        β”‚   β”œβ”€β”€ Products.js
        β”‚   └── Profile.js
        β”œβ”€β”€ components/
        β”‚   β”œβ”€β”€ Navbar.js
        β”‚   β”œβ”€β”€ ProductGrid.js
        β”‚   β”œβ”€β”€ ProductCard.js
        β”‚   β”œβ”€β”€ FilterSidebar.js
        β”‚   β”œβ”€β”€ SearchBar.js
        β”‚   └── VoiceController.js
        β”œβ”€β”€ hooks/
        β”‚   β”œβ”€β”€ useProducts.js
        β”‚   └── useVoice.js
        β”œβ”€β”€ context/
        β”‚   └── VoiceContext.js
        └── styles/
            └── index.css

Backend API

HTTP Endpoints

GET /api/products

Search and filter products.

Query Parameters:

Parameter Type Default Description
query string null Text search in name/description
category string null Filter by category
min_price int null Minimum price filter
max_price int null Maximum price filter
brand string null Filter by brand
sort_by string "relevance" Sort: price_asc, price_desc, rating, relevance
limit int 20 Max products to return

Response:

{
  "success": true,
  "data": {
    "products": [...],
    "total": 15,
    "filters_applied": {
      "category": "mobiles",
      "max_price": 10000
    }
  },
  "metadata": {
    "available_categories": ["mobiles", "laptops", "accessories"],
    "available_brands": ["Samsung", "Apple", "OnePlus", ...],
    "price_range": {"min": 199, "max": 149999}
  }
}

WebSocket Endpoint

WS /ws

Real-time voice communication.

Client β†’ Server:

  • Binary audio data (PCM16, 24kHz)
  • JSON control messages (optional)

Server β†’ Client:

  • Binary audio data (AI response)
  • JSON events:
    • ui_update: Update UI based on voice command
    • transcript_update: Show transcription
    • clear_audio_queue: Handle interruption
    • error: Error messages

ui_update Event Format:

{
  "type": "ui_update",
  "action": "SHOW_PRODUCTS",
  "navigate_to": "/products",
  "filters": {
    "category": "mobiles",
    "max_price": 10000
  },
  "data": {
    "products": [...],
    "total": 7
  },
  "assistant_message": "Here are 7 mobile phones under β‚Ή10,000"
}

Product Data Schema

{
  "id": "MOB001",
  "name": "Samsung Galaxy M14 5G",
  "category": "mobiles",
  "brand": "Samsung",
  "price": 11999,
  "rating": 4.2,
  "thumbnail": "/images/mob001.jpg",
  "specs": {
    "display": "6.6 inch FHD+",
    "processor": "Exynos 1330",
    "ram": "4GB",
    "storage": "64GB",
    "battery": "6000mAh",
    "camera": "50MP Triple"
  },
  "in_stock": true,
  "description": "Budget 5G smartphone with massive battery"
}

Product Categories & Counts

Category Products Price Range
Mobiles 12 β‚Ή6,999 - β‚Ή1,49,999
Laptops 10 β‚Ή29,999 - β‚Ή2,49,999
Accessories 12 β‚Ή199 - β‚Ή24,999

Voice Commands Supported

Product Search

Voice Command Action Parameters
"Show me mobile phones" Navigate + Filter category: mobiles
"Show laptops under 50000" Navigate + Filter category: laptops, max_price: 50000
"Filter by Samsung" Apply Filter brand: Samsung
"Sort by price low to high" Apply Sort sort_by: price_asc
"Show me everything" Clear Filters (none)

Product Details

Voice Command Action
"Tell me about the first product" Show details of product at index 0
"What are the specs of the second one" Show specs of product at index 1
"Compare first and third" Show comparison view

Navigation

Voice Command Action
"Go to home page" Navigate to /
"Show all products" Navigate to /products
"Open my profile" Navigate to /profile

OpenAI Function Definition

{
  "type": "function",
  "name": "search_products",
  "description": "Search and filter products in the e-commerce store. Use this when user wants to find, filter, or browse products.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search text to find in product names"
      },
      "category": {
        "type": "string",
        "enum": ["mobiles", "laptops", "accessories"],
        "description": "Product category to filter by"
      },
      "min_price": {
        "type": "integer",
        "description": "Minimum price in INR"
      },
      "max_price": {
        "type": "integer",
        "description": "Maximum price in INR"
      },
      "brand": {
        "type": "string",
        "description": "Brand name to filter by"
      },
      "sort_by": {
        "type": "string",
        "enum": ["price_asc", "price_desc", "rating", "relevance"],
        "description": "Sort order for results"
      }
    }
  }
}

Frontend State Management

Global State (Context)

  • voiceSession: WebSocket connection state
  • isListening: Whether voice is active
  • lastProducts: Products from last query (for follow-ups)

Products Page State

  • products: Current product list
  • filters: Active filters
  • sortBy: Current sort order
  • loading: Loading state

State Sync (Voice β†’ UI)

When ui_update event received:

  1. If navigate_to present β†’ Router navigates
  2. Update filters state
  3. Update products from data.products
  4. Display assistant_message in voice transcript area

UI Components

VoiceController (Global)

  • Floating button (bottom-right, z-50)
  • Persistent across all pages
  • Visual states: idle, listening, processing, speaking
  • Shows real-time transcript
  • Waveform animation when active

FilterSidebar

  • Category checkboxes
  • Price range slider (β‚Ή0 - β‚Ή250,000)
  • Brand multi-select
  • Clear all filters button

ProductGrid

  • Responsive grid (1-4 columns)
  • ProductCard with image, name, price, rating
  • Loading skeleton
  • Empty state

Error Handling

Scenario Behavior
OpenAI connection fails Show error toast, disable voice button
Invalid voice command AI asks for clarification
No products match Show empty state + AI explains
Network error Retry with exponential backoff
Audio permission denied Show permission request modal

Success Criteria

  1. Functional Parity: Voice search returns identical results to typed search
  2. UI Stability: UI never breaks due to LLM errors
  3. Single API: Same search_products() powers all interactions
  4. Responsiveness: UI updates < 200ms after voice command processed
  5. Professional UX: Demo feels like a real e-commerce site

Implementation Phases

Phase 1: Foundation (Current)

  • PRD documentation
  • Backend product data
  • search_products API
  • Basic React app with routing

Phase 2: UI Components

  • Product grid and cards
  • Filter sidebar
  • Navigation

Phase 3: Voice Integration

  • WebSocket endpoint
  • OpenAI Realtime client
  • VoiceController component
  • ui_update event handling

Phase 4: Polish

  • Animations and transitions
  • Error handling
  • Loading states
  • Mobile responsiveness