VoxLedger is a voice-first personal finance assistant that lets a user register, unlock the app with their voice, manage budgets, track expenses, read notifications, navigate pages, and ask finance-related questions using natural spoken English.
It combines a React + Vite frontend with a FastAPI backend, SQLite database, Whisper speech-to-text, MFCC-based voice authentication, and gTTS text-to-speech.
- Project Overview
- Key Features
- Tech Stack
- How the Project Works
- Project Structure
- Prerequisites
- How to Run the Project
- First-Time Usage
- How to Use the Voice Assistant
- Supported Voice Commands
- How to Reset or Delete Existing User Data
- Troubleshooting
- Backend and Frontend Ports
- Security Notes
- Future Improvements
VoxLedger is designed as a hands-free finance assistant. The user can interact with the application through voice instead of depending only on buttons and text input.
The system supports:
- user registration
- voice sample recording
- voice-based unlock/authentication
- monthly income setup
- monthly and category budget management
- expense logging
- transactions history
- unread notifications
- alert reading
- page navigation by voice
- dark mode voice control
- finance-related queries and insights
The project aims to feel like a secure, practical assistant that understands the app and performs the correct action from spoken input.
- Register with a name, password, and voice sample
- Unlock the app using the registered voice
- Reject silence, random noise, and most invalid audio
- Compare spoken input against stored voice embeddings
- Set monthly income
- Set monthly budget
- Set category budgets
- Add expenses using natural speech
- Support both default and user-created categories
- Open dashboard
- Open budget page
- Open transactions
- Open notifications
- Open alerts
- Open profile
- Open Add Voice Sample page
- Open conversation page
- Read unread notifications
- Read all notifications
- Read first or second notification
- Mark all notifications as read
- Read alerts, critical alerts, warnings, and informational alerts
- Check total spending
- Check remaining balance
- Ask for user details like name and income
- Ask for insights and analytics
- Query transactions by time/date where supported
- Stop speaking immediately with
Stop - Ignore many non-app and hallucinated transcripts
- Support short and long command styles
- Work with simple English phrasing
- Dark mode / light mode switching by voice
- Auto-lock after inactivity
- Profile and voice sample management
- Conversation history
- React 18
- TypeScript
- Vite
- Tailwind CSS
- React Router
- Radix UI
- Framer Motion
- TanStack React Query
- FastAPI
- Uvicorn
- Pydantic
- SQLite
- NumPy / SciPy / scikit-learn
- OpenAI Whisper for speech-to-text
- librosa for audio processing and MFCC features
- gTTS for text-to-speech
- ffmpeg for audio conversion and speed adjustment
- The frontend records audio from the browser microphone.
- Audio is sent to the backend.
- The backend cleans and analyzes the audio.
- If needed, voice authentication checks whether the speaker matches the registered user.
- Whisper converts speech to text.
- The backend detects the user’s intent.
- The correct action is performed:
- navigate page
- update budget
- add expense
- read notifications
- answer a query
- The backend generates a spoken response.
- The frontend plays the assistant response.
- User records a secure voice sample during registration.
- Backend stores a voice embedding in the database.
- At unlock time, spoken audio is converted into a probe embedding.
- Cosine similarity is computed against stored embeddings.
- Access is granted only if the match is strong enough.
Example:
- User says:
I spent two hundred on food - Whisper transcribes the sentence
- Intent parser identifies add expense
- Amount =
200 - Category =
Food - Transaction is stored in SQLite
- Assistant replies with confirmation
VoxLedger_v9_Final/
│
├── VoxLedger_backend/
│ ├── main.py
│ ├── config.py
│ ├── database.py
│ ├── reset_db.py
│ ├── requirements.txt
│ ├── start_backend.ps1
│ ├── start_backend.sh
│ ├── routes/
│ ├── services/
│ ├── utils/
│ ├── database/
│ ├── voice_samples/
│ └── tts_output/
│
├── VoxLedger_frontend/
│ ├── package.json
│ ├── vite.config.ts
│ ├── tailwind.config.ts
│ ├── start.ps1
│ ├── start_frontend.sh
│ ├── src/
│ └── public/
│
└── README.md
main.py— starts the FastAPI applicationdatabase.py— creates and manages the SQLite tablesconfig.py— central configuration valuesreset_db.py— clears existing user data and resets the approutes/voice_routes.py— main voice command pipelineservices/voice_auth_service.py— voice authentication logicservices/whisper_service.py— speech-to-text logicservices/tts_service.py— text-to-speech generationutils/intent_parser.py— identifies user intent from text
package.json— frontend scripts and dependenciessrc/— application pages, components, and voice UI logicstart.ps1— Windows helper script to start frontend quickly
Install these before running the project:
- Python 3.10 or above
- Node.js 18 or above
- npm
- ffmpeg added to system PATH
Use either:
winget install ffmpegOr download ffmpeg manually and add it to your PATH.
brew install ffmpegsudo apt update
sudo apt install ffmpegYou need two terminals:
- one for backend
- one for frontend
Open PowerShell inside VoxLedger_backend and run:
.\start_backend.ps1This script will:
- create a virtual environment if missing
- activate it
- install requirements
- start FastAPI on port
8000
cd VoxLedger_backend
python -m venv venvActivate the virtual environment:
Windows
venv\Scripts\activatemacOS / Linux
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtRun the backend:
uvicorn main:app --reload --port 8000Backend URL:
http://127.0.0.1:8000
API docs:
http://127.0.0.1:8000/docs
Open a new PowerShell window inside VoxLedger_frontend and run:
.\start.ps1cd VoxLedger_frontend
npm install
npm run devFrontend URL:
http://localhost:5173
- Start the backend.
- Start the frontend.
- Open
http://localhost:5173in your browser. - The splash screen checks whether a user already exists.
- If no user exists, go to the registration page.
- Register with:
- name
- password
- voice sample
- Record the secure voice sample clearly.
- After registration, use voice authentication to unlock the app.
Use a longer phrase like:
Hello VoxLedger, this is my secure voice sample for authentication. I will use this voice to access my finance assistant.
This improves the uniqueness of the stored voice pattern.
- Unlock the app with your voice
- Open the page you want or directly speak a command
- Wait for the assistant to process the request
- Hear the response
- Say
Stopanytime to interrupt speaking
- speak clearly
- avoid fan or TV noise nearby
- do not whisper during authentication
- keep a small pause before speaking
- use a normal speaking voice
- stay reasonably close to the microphone
Unlock VoxLedgerAuthenticate meI want to add another voice sampleI need to add one more voice profileGo to profile pageWhat is my name?What is my monthly income?How many voice samples do I have?
Open dashboardOpen budget pageOpen transactionsOpen notificationsOpen alertsOpen profileOpen conversation pageOpen add voice sample page
Set my monthly budget to fifty thousandSet food category budget to 2000Set transport budget to 1000Show my budget
Set my monthly income to one lakhUpdate my monthly incomeWhat is my monthly income?
Add 200 foodAdd two hundred rupees for foodPlease log two hundred rupees for food expensesI spent 500 on shoppingSpend 100 on skincare
How much did I spend?What is my total spending?How much money have I spent?What is my remaining balance?What balance do I have left?Give me my spending insightShow my insights
Read notificationsRead unread notificationsRead first notificationRead second notificationMark all notifications as read
Read alertsRead critical alertsRead warningsRead info alerts
Turn on dark modeEnable dark modeSwitch to dark themeTurn off dark modeSwitch to light mode
Delete first transactionDelete second transactionConfirm delete
StopStop speaking
If you want to remove all old users and register a fresh new user, use the backend reset script.
Go to the backend folder and run:
cd VoxLedger_backend
python reset_db.pyThis clears user-related data from the database, including:
- users
- voice embeddings
- notifications
- transactions
- budgets
- conversation history
After running it, open the frontend again and register a new user.
The SQLite database is stored at:
VoxLedger_backend/database/voxledger.db
If needed, you can stop the backend and delete this database file manually, then restart the backend so tables are recreated.
Check:
- Python is installed
- virtual environment is activated
pip install -r requirements.txtcompleted successfully- port
8000is free
Check:
- Node.js is installed
npm installcompleted successfully- port
5173is free
Check:
- browser microphone permission is allowed
- backend is running
- frontend is running
- you are speaking clearly
- environment noise is low
Check:
- a voice sample was actually saved during registration
- you are using the same voice/user
- the voice sample was recorded clearly
- you are not too far from the mic
If needed, reset the database and register again with a cleaner voice sample.
If backend logs mention ffmpeg issues:
- install ffmpeg
- add it to PATH
- restart terminal after installation
Whisper runs on CPU by default on many systems, so the first few requests may feel slower.
| Service | Port | URL |
|---|---|---|
| Backend | 8000 | http://127.0.0.1:8000 |
| Frontend | 5173 | http://localhost:5173 |
- Passwords exist in the project flow for login/registration, but sensitive details should never be spoken back by the assistant.
- Voice authentication should be used for access control, but it should always be tested carefully in noisy real-world environments.
- Destructive actions such as transaction deletion should require confirmation.
- If you plan to deploy this project publicly, improve:
- password storage security
- session management
- HTTPS usage
- production database setup
- production-grade authentication rules
Possible future upgrades:
- better speaker verification model
- stronger noise rejection for cough and room sounds
- multilingual support
- real-time live VAD on frontend
- cloud database instead of SQLite
- downloadable reports
- charts and richer analytics insights
- mobile app version
- role-based multi-user voice profiles
VoxLedger is a complete voice-first finance assistant project built for:
- secure voice login
- personal finance tracking
- budget management
- voice navigation
- smart finance queries
To use it:
- start backend
- start frontend
- register a user
- record a voice sample
- unlock with voice
- manage finance operations using speech