Local AI Agent that Creates Automated Short Informative Slideshows

This project is a POC AI agent designed to generate short, informative, and engaging Slideshows (vidoes) based on user-provided topics. It automates the entire video creation process, including generating narration, selecting images, and synchronizing them into a cohesive video - all using local ML model (and some online search)

At a high level, here's how it works:

Accepts a prompt as initial input (i.e., topic). Currently, it choses from one randomly from a set of given topics.
A local LLM served through Ollama generates the text content for the slideshow based on the instructions given in the prompt.
The texts are chunked and converted to speech using a TTS model.
For each text chunk, some search key words are identified to search and download relevant images from DuckDuckGo.
These images and the relevant audio segments are then stitched together to form the final slideshow.

Installation

Set up a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run:
```
python main.py
```

Please note that ffmpeg and Ollama are required to run this project. Ollama can be downloaded from here: https://ollama.com/. For ffmpeg, please check: https://ffmpeg.org/download.html for more info.

Output

The final output is stored in the output/{timestamp} folder by default along with the associated images and audio files in output/{timestamp}/images and output/{timestamp}/audio respectively.

Source Code Files Overview

main.py: Orchestrates the entire pipeline.
speech_generation.py: Handles text generation and text-to-speech conversion.
image_fetcher.py: Fetches images for video content.
utilities.py: Utility functions for keyword extraction, text chunking, and video assembly.

Tools and Technologies

Text Generation: The text content for the slideshow is generated using Mistral 7B (4-bit Quantized version). Ollama is used to serve the LLM model locally. It was chosen due to its simplicity in accessing and running smaller LLMs locally.
Text to Speech: For text to speech, "tts_models/en/ljspeech/tacotron2-DDC" TTS model is used.
Search Keyword Extraction: KeyBERT python library is used to generate search keywords, which uses "all-MiniLM-L6-v2" Sentence Transformer model by default, although it can be customized.
Image: Images are searched and collected from DuckDuckGo leveraging "duckduckgo-search" python library. Some rate limiting logic and a fallback mechanism to use a plain image complement the overall image selection process.
Video Assembly: "moviepy" library is used for stitching together the audio segments and the relevant images.
Text Chunking: Text chunking is done using LangChain's RecursiveCharacterTextSplitter, which tries to break text at the largest meaningful unit first (like paragraphs), and only breaks at smaller units (e.g., characters) if necessary.
Orchestration: LangGraph is used to orchestrate the end-to-end slideshow generation workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
image_fetcher.py		image_fetcher.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
speech_generation.py		speech_generation.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local AI Agent that Creates Automated Short Informative Slideshows

Installation

Output

Source Code Files Overview

Tools and Technologies

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local AI Agent that Creates Automated Short Informative Slideshows

Installation

Output

Source Code Files Overview

Tools and Technologies

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages