Skip to content

Dhanushmh5/AI-Powered-Elearning_Platform_Nodejs-ExpressJs-ReactJs-TailwindCSS-REST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDG Reporting: Text Mining, NLP, and LLM-based Analysis

Overview

This repository contains the project implementation and report for an SDG (Sustainable Development Goals) reporting system that leverages Text Mining, Natural Language Processing (NLP), and Large Language Models (LLMs). The core objective is to automate and enhance the process of identifying and extracting information relevant to SDGs from unstructured textual data, providing insights into an organization's contributions towards these global goals.

The project addresses the challenges of manual SDG reporting by offering a scalable and efficient solution for data collection, preprocessing, feature extraction, and classification using advanced AI techniques.

Project Goals

  • Automate SDG Reporting: Move from manual, time-consuming reporting to an automated, data-driven process.
  • Leverage NLP & LLMs: Utilize state-of-the-art NLP techniques and Large Language Models for intelligent text analysis and classification.
  • Identify SDG Contributions: Accurately classify text segments based on their relevance to specific SDGs.
  • Provide Actionable Insights: Offer a comprehensive overview of an organization's impact on sustainable development.

Features

  • Data Collection & Preprocessing: Includes methods for handling unstructured text data, converting PDFs to text, and cleaning raw data.
  • Text Feature Extraction: Techniques such as TF-IDF are used to convert text into numerical features suitable for machine learning models.
  • Machine Learning Models: Implementation of various ML models (e.g., SVM, Random Forest, Naive Bayes) for text classification.
  • LLM Integration: Exploration of LLMs for advanced semantic understanding and categorization of text related to SDGs.
  • Performance Evaluation: Metrics like Accuracy, Precision, Recall, and F1-score are used to evaluate model performance.
  • Visualization: Tools for visualizing the results and SDG contributions (though specific visualization code might not be in this repo, the report covers it).

Technologies Used

  • Python: Primary programming language for data processing, NLP, and ML.
  • Natural Language Processing (NLP) Libraries: (e.g., NLTK, spaCy, Hugging Face Transformers for LLMs)
  • Machine Learning Libraries: (e.g., scikit-learn, PyTorch/TensorFlow for LLMs)
  • Data Handling: (e.g., Pandas for data manipulation)
  • PDF Processing: (e.g., PyPDF2, pdfminer.six for PDF to text conversion)

Getting Started

Prerequisites

  • Python 3.x
  • pip (Python package installer)

Installation

  1. Clone the repository:
    git clone [https://github.com/Dhanushmh5/YourSDGRepoName.git](https://github.com/Dhanushmh5/YourSDGRepoName.git) # Replace YourSDGRepoName
    cd YourSDGRepoName
  2. Create and activate a virtual environment (recommended):
    python3 -m venv venv
    source venv/bin/activate  # On macOS/Linux
    # For Windows: venv\Scripts\activate
  3. Install dependencies: (You'll need to create a requirements.txt file listing all Python libraries used. Based on the report, it would include things like pandas, numpy, scikit-learn, nltk, spacy, transformers, torch or tensorflow, pypdf2.)
    pip install -r requirements.txt
    Example requirements.txt content:
    pandas
    numpy
    scikit-learn
    nltk
    spacy
    transformers
    torch # or tensorflow, depending on LLM framework
    pypdf2 # or pdfminer.six
    
  4. Download necessary NLP models/data (if applicable):
    python -m nltk.downloader punkt stopwords # Example for NLTK
    python -m spacy download en_core_web_sm # Example for spaCy

Usage

(This section will depend on what scripts you have in your repository. For example:)

  1. Run the data preprocessing script:
    python scripts/preprocess_data.py
  2. Train the ML models:
    python scripts/train_model.py
  3. Run LLM-based analysis:
    python scripts/llm_analysis.py
  4. Generate reports/visualizations:
    python scripts/generate_report.py
    (Adjust script names to match your actual files)

Project Structure (Example)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •