Skip to content

DhruvPatel0110/NotesMaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NotesMaker

NotesMaker is a real-time lecture note-making project built for turning system audio into usable study material. It listens to audio playing on the computer, converts speech into text using Whisper, and stores that live transcript in raw.txt. When recording is stopped, it automatically runs a second script that reads the transcript and generates summarized notes in notes.txt. The project is designed for long lecture-style audio, not music, and uses VB-Cable to route system output into Python. The workflow is simple: capture audio -> transcribe chunk by chunk -> stop recording -> summarize transcript into notes.

What You Need To Download

Before running the project, install these tools and packages:

1. Python

  • Python 3.10 or later

2. Python packages

Install these in your terminal:

pip install openai-whisper sounddevice scipy requests numpy

3. FFmpeg

  • FFmpeg is required by Whisper for audio handling.
  • Download and install FFmpeg, then make sure it is available in your system PATH.

4. Ollama

  • Install Ollama from its official website.
  • Start Ollama on your machine.

5. TinyLlama model in Ollama

After Ollama is installed, pull the model:

ollama pull tinyllama

6. VB-Cable

  • Install VB-Cable / VB-Audio Virtual Cable.
  • This is what routes your system audio into Python so the project can hear whatever is playing on your computer.

Step 1: Full Setup

This is the most important step. If the audio routing is wrong, Whisper will not receive the lecture audio properly.

A. Install all dependencies

Make sure Python, FFmpeg, Ollama, TinyLlama, and VB-Cable are installed first.

B. Route system audio through VB-Cable

The project expects the computer audio to go into VB-Cable, and then Python listens to the VB-Cable recording side.

On Windows:

  1. Open Sound Settings
  2. Under Output, change the default output device to: CABLE Input (VB-Audio Virtual Cable)
  3. This means your browser / video / lecture audio is now being sent into VB-Cable

C. Check the recording side

  1. Open the old-style Sound Control Panel
  2. Go to the Recording tab
  3. Find: CABLE Output (VB-Audio Virtual Cable)
  4. This is the device Python should listen to

D. Match sample rate settings

The script records at 44100 Hz, so keep the VB-Cable device settings aligned with that.

In Sound Control Panel:

  1. Open Playback tab
  2. Open properties for CABLE Input
  3. Go to Advanced
  4. Set a format close to 44100 Hz

Then:

  1. Open Recording tab
  2. Open properties for CABLE Output
  3. Go to Advanced
  4. Set the same or matching sample rate, preferably 44100 Hz

If you see sound issues, also try:

  • turning off extra audio enhancements
  • keeping the playback and recording formats the same

E. Find the correct Python device index

Run:

py -3.10 list_devices.py

This prints all audio devices. In this project, the script should use the index for CABLE Output.

Open SpeechToText.py and set:

DEVICE_INDEX = 1

Change that number if your VB-Cable device appears at a different index on your machine.

Step 2: Test Whisper First

Before using the full live pipeline, first test whether Whisper is working.

This repo includes test_whisper.py, which transcribes the sample file test_sound.mp3.

Run:

py -3.10 test_whisper.py

If this works, Whisper is installed correctly and can convert audio to text.

Optional helper files:

  • list_devices.py lists all available audio devices
  • record_test.py records a short WAV file from the selected input device so you can check whether VB-Cable routing is working

If Step 2 fails, fix that first before moving to live recording.

Step 3: Run Live Speech-To-Text

Once setup and testing are done, start the main transcription script:

py -3.10 SpeechToText.py

What happens:

  • the script first asks if you want to clear raw.txt
  • then it asks if you want to clear notes.txt
  • if both answers are yes, recording starts
  • if either answer is no, the script exits

During recording:

  • play your lecture / course / spoken audio on the computer
  • the script captures system audio through VB-Cable
  • audio is processed in chunk-sized blocks
  • Whisper transcribes each chunk
  • the transcript is appended continuously to raw.txt

You can open raw.txt while the script is running and see the transcript grow over time.

To stop recording:

Ctrl + C

The script does not abruptly stop and throw away the last part. It first flushes the final buffered audio, processes the last block, finishes transcription, and only then exits the capture stage.

Step 4: Automatic Summarization

After SpeechToText.py is stopped, it automatically launches summarizer.py.

What summarizer.py does:

  • reads the full raw.txt
  • splits large transcript text into manageable text chunks
  • sends those chunks to TinyLlama through Ollama
  • combines the partial summaries
  • writes the final result to notes.txt

The final notes.txt is meant to contain:

  • bullet-wise notes
  • clearer main points
  • condensed lecture context
  • a cleaner summary than the raw transcript

Important Notes

  • This project is meant for lecture-style or educational spoken content.
  • If you feed it songs or non-lecture audio, the summary can become weird or low quality.
  • raw.txt is the direct transcript.
  • notes.txt is the summarized output.
  • chunks/ contains temporary audio files during processing and is not meant to be kept.

Repo Files

  • SpeechToText.py - live audio capture and transcription
  • summarizer.py - transcript summarization using TinyLlama through Ollama
  • list_devices.py - audio device listing helper
  • record_test.py - basic input recording test
  • test_whisper.py - test transcription on sample audio

About

Real-time lecture note maker using Whisper, VB-Cable, and TinyLlama.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages