Here’s a detailed explanation of how the project works along with instructions on how to run it on your computer. 🚀
The project aims to summarize Hindi news articles by extracting content from supported websites and generating concise summaries using state-of-the-art models like:
- BART: Fine-tuned for Hindi text summarization.
- mT5: A multilingual model that supports Hindi summarization.
- The user provides a URL of a news article (e.g., from Amarujala).
- The URL is passed to the API.
- The API endpoint accepts the URL and a model name (
BARTorT5). - API Endpoint:
https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/
- The system scrapes the news article content.
- Cleans and preprocesses the text to remove unwanted elements like ads, HTML tags, and irrelevant sections.
- The preprocessed text is passed to the selected model (
BARTorT5). - The model generates a concise summary of the article in Hindi.
- Sentiment analysis is performed to classify the summary as Positive, Negative, or Neutral.
- A word cloud is generated from the summarized text to visualize key topics.
- The final output includes:
- Title
- Summary
- Sentiment
- Key Topics
- Optional: WordCloud Visualization
git clone https://github.com/Divyateja2709/akaike.gitcd akaike# Windows
python -m venv venv
venv\Scripts\activate
# Mac/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtpython app.py- Open your browser and go to:
http://127.0.0.1:5000
import requests
# API endpoint
api_endpoint = "https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/"
# News article URL
news_url = "https://www.amarujala.com/uttar-pradesh/shamli/up-news-heroin-caught-in-shaheen-bagh-of-delhi-is-connection-to-kairana-and-muzaffarnagar?src=tlh\u0026position=3"
# API Request
response = requests.post(
url=api_endpoint,
json={"data": [news_url, "BART"]}
)
# Get the summarized output
summary = response.json()['data'][0]
print(summary)To generate a WordCloud for summarized Hindi text:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
def plot_wordcloud(text):
wordcloud = WordCloud(font_path='path_to_hindi_font.ttf', width=800, height=400, background_color='white').generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
# Generate WordCloud for the summarized text
plot_wordcloud(summary)- Make sure Python version >= 3.8 is installed.
- Verify that all dependencies from
requirements.txtare installed properly. - If facing issues, deactivate the virtual environment and reactivate it:
# Deactivate
deactivate
# Activate again
venv\Scripts\activateNow the application is running locally, and you can start summarizing Hindi news articles. 🚀