Skip to content

maithilmishra/Smart-Book-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Smart Book Assistant 📚❓

A Small Language Model (SLM) that answers questions based on a book or text. Built with Python, Hugging Face Transformers, and FAISS, this project is designed to help you extract information from large texts quickly and accurately.

Python Version


Features ✨

  • Preprocess books into clean, manageable chunks.
  • Retrieve relevant text using semantic search (FAISS).
  • Answer questions using DistilBERT (Hugging Face).
  • Evaluate accuracy with Exact Match (EM) and F1 scores.

Table of Contents 📑

  1. Quick Start
  2. How to Use the Model
  3. Project Structure
  4. Documentation
  5. Key Learnings
  6. Future Improvements

Quick Start 🚀

1. Clone the Repository

git clone https://github.com/maithilmishra/Smart-Book-Assistant.git
cd Smart-Book-Assistant

2. Install Dependencies

pip install -r requirements.txt

3. Prepare Your Data

. Add your book text to book.txt. . Add test questions to test_questions.json (example below).

4. Run the System

# Interactive mode (ask questions by index)
python -m src.main

# Evaluate accuracy
python -m src.evaluate

How to Use the Model 🖥️

Step 1: Add Your Book

  • Place your book text in book.txt. Example:
    In the small town of Willow Creek, there lived a brilliant scientist named Professor Waldo...

Step 2: Add Test Questions

  1. Add questions and answers to test_questions.json. Example:
    [
       {
         "question": "Who discovered the secret formula?",
         "answer": "Professor Waldo" 
       }
    ]

Step 3: Run the System

  1. Interactive Mode:
python -m src.main
  • The system will display available questions.
  • Enter the index of the question you want to ask.
  1. Evaluation Mode:
python -m src.evaluate
  • This will test the system’s accuracy on the questions in test_questions.json.

Project Structure 🗂️

Smart-Book-Assistant/   
├── src/
│   ├── preprocess.py          # Text cleaning/chunking
│   ├── retrieval.py           # FAISS-based retrieval
│   ├── qa_model.py            # Answer generation
│   ├── main.py                # Interactive pipeline
│   |── evaluate.py            # Accuracy testing
|   ├── book.txt               # Your input book text
|   └── test_questions.json    # Test questions/answers
├── requirements.txt           # Dependencies
├── README.md                  # Project documentation

Observations 🔍

What Worked Well ✅

  • Chunking Strategy: -- Breaking the book into smaller sections (e.g., paragraphs) improved answer precision. -- Larger chunks made it harder to find specific information.

  • FAISS for Retrieval: -- FAISS was incredibly fast at finding relevant chunks, even for large books. -- Semantic search worked better than keyword-based approaches.

  • DistilBERT for Answers: -- The pre-trained model provided accurate answers for straightforward questions. -- Fine-tuning on SQuAD made it well-suited for question answering.

Challenges ❌

  • Ambiguous Questions: -- The system struggled with questions requiring context across multiple chunks. -- Example: "Why did the protagonist leave the city?" often resulted in incomplete answers.

  • Long Answers: -- For questions with detailed answers, the system sometimes misses key details.

  • Preprocessing: -- Cleaning and chunking the text required careful tuning to ensure optimal performance.

Insights 💡

  • Trade-offs: -- Smaller models (like DistilBERT) are faster but may sacrifice some accuracy compared to larger models. -- Balancing chunk size and retrieval accuracy is critical.

  • Evaluation: -- Metrics like Exact Match (EM) and F1 Score provided a good baseline, but human review was essential for identifying edge cases.

Key Learnings 🔑

  • Smaller text chunks improve answer precision.
  • FAISS enables fast retrieval even for large books.
  • Trade-offs exist between model size (speed) and accuracy.

Future Improvements 🚀

  • Multi-Hop Reasoning: Answer questions requiring information from multiple chunks.
  • User Interface: Build a web or mobile app for easier interaction.
  • Larger Models: Experiment with larger models (e.g., BERT-large) for better accuracy.

Contributors 👥

About

A Small Language Model (SLM) that answers questions based on a provided book or text. It uses semantic search (FAISS) to retrieve relevant content and DistilBERT (Hugging Face) to generate precise answers. Built with Python, the system achieves 75% Exact Match accuracy and is designed for efficient, scalable question answering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages