NotesMaker is a real-time lecture note-making project built for turning system audio into usable study material.
It listens to audio playing on the computer, converts speech into text using Whisper, and stores that live transcript in raw.txt.
When recording is stopped, it automatically runs a second script that reads the transcript and generates summarized notes in notes.txt.
The project is designed for long lecture-style audio, not music, and uses VB-Cable to route system output into Python.
The workflow is simple: capture audio -> transcribe chunk by chunk -> stop recording -> summarize transcript into notes.
Before running the project, install these tools and packages:
- Python 3.10 or later
Install these in your terminal:
pip install openai-whisper sounddevice scipy requests numpy- FFmpeg is required by Whisper for audio handling.
- Download and install FFmpeg, then make sure it is available in your system
PATH.
- Install Ollama from its official website.
- Start Ollama on your machine.
After Ollama is installed, pull the model:
ollama pull tinyllama- Install VB-Cable / VB-Audio Virtual Cable.
- This is what routes your system audio into Python so the project can hear whatever is playing on your computer.
This is the most important step. If the audio routing is wrong, Whisper will not receive the lecture audio properly.
Make sure Python, FFmpeg, Ollama, TinyLlama, and VB-Cable are installed first.
The project expects the computer audio to go into VB-Cable, and then Python listens to the VB-Cable recording side.
On Windows:
- Open Sound Settings
- Under Output, change the default output device to:
CABLE Input (VB-Audio Virtual Cable) - This means your browser / video / lecture audio is now being sent into VB-Cable
- Open the old-style Sound Control Panel
- Go to the Recording tab
- Find:
CABLE Output (VB-Audio Virtual Cable) - This is the device Python should listen to
The script records at 44100 Hz, so keep the VB-Cable device settings aligned with that.
In Sound Control Panel:
- Open Playback tab
- Open properties for
CABLE Input - Go to Advanced
- Set a format close to
44100 Hz
Then:
- Open Recording tab
- Open properties for
CABLE Output - Go to Advanced
- Set the same or matching sample rate, preferably
44100 Hz
If you see sound issues, also try:
- turning off extra audio enhancements
- keeping the playback and recording formats the same
Run:
py -3.10 list_devices.pyThis prints all audio devices. In this project, the script should use the index for CABLE Output.
Open SpeechToText.py and set:
DEVICE_INDEX = 1Change that number if your VB-Cable device appears at a different index on your machine.
Before using the full live pipeline, first test whether Whisper is working.
This repo includes test_whisper.py, which transcribes the sample file test_sound.mp3.
Run:
py -3.10 test_whisper.pyIf this works, Whisper is installed correctly and can convert audio to text.
Optional helper files:
list_devices.pylists all available audio devicesrecord_test.pyrecords a short WAV file from the selected input device so you can check whether VB-Cable routing is working
If Step 2 fails, fix that first before moving to live recording.
Once setup and testing are done, start the main transcription script:
py -3.10 SpeechToText.pyWhat happens:
- the script first asks if you want to clear
raw.txt - then it asks if you want to clear
notes.txt - if both answers are
yes, recording starts - if either answer is
no, the script exits
During recording:
- play your lecture / course / spoken audio on the computer
- the script captures system audio through VB-Cable
- audio is processed in chunk-sized blocks
- Whisper transcribes each chunk
- the transcript is appended continuously to
raw.txt
You can open raw.txt while the script is running and see the transcript grow over time.
To stop recording:
Ctrl + CThe script does not abruptly stop and throw away the last part. It first flushes the final buffered audio, processes the last block, finishes transcription, and only then exits the capture stage.
After SpeechToText.py is stopped, it automatically launches summarizer.py.
What summarizer.py does:
- reads the full
raw.txt - splits large transcript text into manageable text chunks
- sends those chunks to TinyLlama through Ollama
- combines the partial summaries
- writes the final result to
notes.txt
The final notes.txt is meant to contain:
- bullet-wise notes
- clearer main points
- condensed lecture context
- a cleaner summary than the raw transcript
- This project is meant for lecture-style or educational spoken content.
- If you feed it songs or non-lecture audio, the summary can become weird or low quality.
raw.txtis the direct transcript.notes.txtis the summarized output.chunks/contains temporary audio files during processing and is not meant to be kept.
SpeechToText.py- live audio capture and transcriptionsummarizer.py- transcript summarization using TinyLlama through Ollamalist_devices.py- audio device listing helperrecord_test.py- basic input recording testtest_whisper.py- test transcription on sample audio