This package includes a data set of full-text English renderings of Indian Independence Day speeches, delivered annually on 15 August since 1947.
Recent speeches are easily found online from the Press Information Bureau. For older speeches, I was able to find them in volumes of collected speeches in the libraries of Jawaharlal Nehru University and the Nehru Memorial Museum. Speeches in those volumes were digitized by uploading images to Google Drive’s native OCR feature.
The data set is only missing speeches from 1962 and 1995. Please contact me if you’re able to find the speech for those years! Or evidence of one not taking place.
You can access the data set by installing the package from GitHub.
# install.packages("devtools")
devtools::install_github("seanangio/aug15")The data set is called corpus. To preview it, run something like:
library(dplyr)
library(aug15)
glimpse(corpus)
#> Rows: 79
#> Columns: 8
#> $ year <dbl> 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2…
#> $ pm <chr> "Narendra Modi", "Narendra Modi", "Narendra Modi", "Narendra …
#> $ party <chr> "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP"…
#> $ title <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ footnote <chr> "English rendering of Prime Minister Shri Narendra Modi's add…
#> $ source <chr> "Press Information Bureau", "Press Information Bureau", "Pres…
#> $ url <chr> "https://pib.gov.in/", "https://pib.gov.in/", "https://pib.go…
#> $ text <chr> "My dear countrymen,\n\nThis grand festival of Independence i…Alternatively, you can directly download the CSV file or browse any of the speeches in this folder.
For a brief investigation into the data set, this package includes a shiny app to make basic visualizations, including plots of:
- speech length
- most ‘important’ words (TF-IDF)
- most frequent positive and negative words (according to the Bing lexicon)
- net sentiment (according to the Bing lexicon)
- and the frequency of any word supplied by the user
A Python implementation of the analysis app is also available, built with Streamlit. It provides the same visualizations and functionality as the R Shiny app. For installation and usage instructions, see the Python README.




