LearnIt is an NLP application designed to help users transition to non-linear notetaking methods, which research suggests better align with the process of encoding knowledge. It acts as an intelligent companion during lectures or for analyzing audio recordings afterwards, automatically generating an interactive mindmap and providing a context-aware chatbot for querying the content.
Example mindmap generated by LearnIt based on a discussion about Fort Knox gold reserves, alongside the integrated Q&A chatbot.
Traditional linear notetaking can be less effective for capturing complex relationships and fostering deeper understanding. However, adopting non-linear methods like mindmapping can be challenging initially due to a lack of familiarity and tools that seamlessly integrate into the learning process. LearnIt aims to bridge this gap by demonstrating non-linear thinking by example.
- Audio Input: Accepts real-time microphone input or pre-recorded audio files.
- Real-time Processing: Uses Voice Activity Detection (VAD) to intelligently chunk audio during natural pauses.
- Automatic Mindmap Generation: Creates interactive mindmaps on an infinite canvas using technologies like React Flow.
- Node Selection: Employs Named Entity Recognition (NER) and cosine similarity filtering to identify and deduplicate key concepts (vertices).
- Edge Creation: Establishes connections (edges) between related concepts based on co-occurrence within the text.
- Relation Extraction: Uses LLMs (GPT-4o-mini) to determine the nature of the relationship between connected nodes.
- Title Generation: Automatically suggests a relevant title for the mindmap using a fine-tuned BART model.
- RAG-Powered Chatbot: An integrated LLM chatbot uses Retrieval-Augmented Generation (RAG) with a Pinecone vector store to answer user questions based specifically on the content of the processed audio.
- Interactive Visualization: Users can explore the generated mindmap dynamically.
- Audio Ingestion & Chunking: Audio is captured (real-time or file) and segmented using VAD to respect natural speech pauses.
- Transcription: Segments are transcribed using OpenAI's Whisper STT.
- Entity Recognition & Filtering: The transcription is processed by spaCy for NER. Entities are filtered based on type (e.g., removing quantities) and similarity (using cosine similarity to merge near-duplicates). These become the mindmap nodes (vertices).
- Edge & Relation Generation: If two entities appear within a defined proximity (X sentences), an edge is created. OpenAI's GPT-4o-mini is then used to label the relationship on the edge.
- Title Generation: The transcript is fed into a fine-tuned BART model (trained on the newsroom dataset) to generate a concise title.
- Vector Storage & RAG: The transcription chunks are stored in a Pinecone vector database. When a user asks a question, relevant chunks are retrieved and provided as context to an LLM (like GPT-4o-mini) to generate an informed answer.
- Frontend Display: The mindmap (nodes, edges, relationships, title) is rendered interactively using React Flow. The chatbot interface allows user queries.
React Frontend:
- Mindmap: React Flow
- Audio Handling: RecordRTC, VAD library
- Styling: shadcn/ui, TailwindCSS
- Real-time Communication: WebSockets
FastAPI Backend:
- NER & Similarity: spaCy
- RAG Vector Store: Pinecone
- Mindmap Structure: networkx
- Title Generation: Fine-tuned BART model
- STT: OpenAI Whisper
- Relation Extraction & Chat: OpenAI GPT-4o-mini (with structured output)
- Generic Relation Extraction: This is a challenging NLP task due to its open-ended nature. Training models requires specific datasets often focused on fixed relation types, and using powerful LLMs incurs latency and cost.
- Ambiguity in Language: Processing natural language inherently involves dealing with ambiguity. STT can misspell entities, simple co-occurrence might miss distant but relevant connections, and capturing implied meaning is difficult. Effective NLP applications strive to minimize this ambiguity.
NLP Enhancements:
- Explore more efficient methods for generic relation extraction (reducing LLM dependency).
- Investigate more sophisticated techniques for selecting mindmap vertices beyond just named entities (e.g., key topic extraction).
Application Features:
- Implement tooltips on vertices showing the specific transcript segment where the entity appeared.
- Add support for persistent sessions, user accounts, and saving mindmaps.
- Introduce a dual-view mode allowing users to create their own mindmap alongside LearnIt's generated one for comparison.
Find the related report here and video here
Renny Hoang