Skip to content

package for 15 Aug speeches

License

Unknown, CC-BY-4.0 licenses found

Licenses found

Unknown
LICENSE
CC-BY-4.0
LICENSE.md
Notifications You must be signed in to change notification settings

seanangio/aug15

Repository files navigation

aug15 - Data set of Indian Independence Day Speeches

This package includes a data set of full-text English renderings of Indian Independence Day speeches, delivered annually on 15 August since 1947.

Recent speeches are easily found online from the Press Information Bureau. For older speeches, I was able to find them in volumes of collected speeches in the libraries of Jawaharlal Nehru University and the Nehru Memorial Museum. Speeches in those volumes were digitized by uploading images to Google Drive’s native OCR feature.

The data set is only missing speeches from 1962 and 1995. Please contact me if you’re able to find the speech for those years! Or evidence of one not taking place.

Installation

You can access the data set by installing the package from GitHub.

# install.packages("devtools")
devtools::install_github("seanangio/aug15")

The data set is called corpus. To preview it, run something like:

library(dplyr)
library(aug15)
glimpse(corpus)
#> Rows: 79
#> Columns: 8
#> $ year     <dbl> 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2…
#> $ pm       <chr> "Narendra Modi", "Narendra Modi", "Narendra Modi", "Narendra …
#> $ party    <chr> "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP"…
#> $ title    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ footnote <chr> "English rendering of Prime Minister Shri Narendra Modi's add…
#> $ source   <chr> "Press Information Bureau", "Press Information Bureau", "Pres…
#> $ url      <chr> "https://pib.gov.in/", "https://pib.gov.in/", "https://pib.go…
#> $ text     <chr> "My dear countrymen,\n\nThis grand festival of Independence i…

Alternatively, you can directly download the CSV file or browse any of the speeches in this folder.

Investigation

For a brief investigation into the data set, this package includes a shiny app to make basic visualizations, including plots of:

  • speech length

Plot of speech word count

Plot of speech word count
  • most ‘important’ words (TF-IDF)

Plot of TF-IDF for recent years

Plot of TF-IDF for recent years
  • most frequent positive and negative words (according to the Bing lexicon)

Plot of most frequent positive and negative words

Plot of most frequent positive and negative words
  • net sentiment (according to the Bing lexicon)

Plot of net sentiment

Plot of net sentiment
  • and the frequency of any word supplied by the user

Plot of frequency of the term Kashmir

Plot of frequency of the term Kashmir

Python Port

A Python implementation of the analysis app is also available, built with Streamlit. It provides the same visualizations and functionality as the R Shiny app. For installation and usage instructions, see the Python README.

About

package for 15 Aug speeches

Resources

License

Unknown, CC-BY-4.0 licenses found

Licenses found

Unknown
LICENSE
CC-BY-4.0
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published