Open-source alternative to OpusClip - Transform long-form videos into engaging YouTube Shorts automatically using AI-powered transcription, clip detection, and viral title generation. Built on the powerful clipsai library.
- Smart Clip Detection: AI identifies the most engaging moments in your videos
- Auto-Resize: Automatically crops videos to 9:16 aspect ratio for YouTube Shorts
- Transparent Logo Overlay: Adds your logo at the top of the exported clip with adjustable opacity
- Animated Subtitles: Clean, bold subtitles with smart styling (white text, yellow for numbers/currency)
- Viral Title Generation: AI generates catchy, titles optimized for engagement
- Transcription Caching: Save time by reusing existing transcriptions
- Multiple Video Support: Process multiple videos in one session
- Engagement Scoring: Intelligent clip selection based on content engagement metrics
| Feature | ClippedAI | OpusClip |
|---|---|---|
| Cost | 100% Free | $39/month |
| Privacy | Local processing | Cloud-based |
| Customization | Fully customisable | Limited options |
| API Keys | Free (HuggingFace + Groq) | Paid subscriptions |
| Offline Use | Works offline (with no auto titles) | Requires internet |
| Source Code | Open source | Proprietary |
| Model Control | Choose your own models | Fixed models |
| Transcription Caching | Save time & money | No caching |
Perfect for: Content creators, developers, and anyone who wants professional video editing capabilities without the monthly subscription costs!
- Python 3.8+ (Tested on 3.11)
- FFmpeg installed and available in PATH
- 8GB+ RAM (16GB+ recommended for large models)
- GPU (optional but recommended for faster processing)
-
Clone the repository
git clone https://github.com/Shaarav4795/ClippedAI.git cd ClippedAI -
Create and activate virtual environment
# On macOS/Linux python3 -m venv env source env/bin/activate # On Windows python -m venv env env\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Install FFmpeg
# macOS (using Homebrew) brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install ffmpeg # Windows (using Chocolatey) choco install ffmpeg # Or download from https://ffmpeg.org/download.html
-
Create environment file
# Copy the example environment file cp .env.example .env # Edit the .env file with your API keys: nano .env
-
Sign up for HuggingFace
- Go to HuggingFace and create a free account
-
Request access to Pyannote models
- Visit pyannote/speaker-diarization
- Click "Access repository" and accept the terms
- Visit pyannote/speaker-diarization-3.1
- Click "Access repository" and accept the terms
- Visit pyannote/segmentation
- Click "Access repository" and accept the terms
-
Create your API token
- Go to HuggingFace Settings > Access Tokens
- Click "New token"
- Give it a name (e.g., "ClippedAI")
- Select "Read" role (minimum required)
- Click "Generate token"
- Copy the token immediately (you won't see it again)
-
Add the token to your environment file
- Edit the
.envfile and replaceyour_huggingface_token_herewith your actual token - Example:
HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- Edit the
Note: The first time you run the script, it will download the Pyannote models (~2GB). This may take several minutes depending on your internet connection.
- Sign up at Groq (free tier available)
- Get your API key from the dashboard
- Add your API key to the
.envfile whereGROQ_API_KEY=your_groq_api_key_here
- Sign up at OpenAI
- Create an API key from API Keys
- Add your API key to the
.envfile whereOPENAI_API_KEY=your_openai_api_key_here
Use LLM_PROVIDER to choose which model provider generates title/description/tags:
LLM_PROVIDER=groq # free option
# LLM_PROVIDER=openai # paid optionIf the selected provider key is missing, the script safely falls back to default metadata.
The script uses Whisper models via clipsai. Choose based on your hardware:
| Model | Size | Speed | Accuracy | RAM Usage | Best For |
|---|---|---|---|---|---|
tiny |
39MB | Very Fast | Low | 1GB | Quick testing, basic accuracy |
base |
74MB | Fast | Medium | 1GB | Good balance, most users |
small |
244MB | Moderate | High | 2GB | Better accuracy, recommended |
medium |
769MB | Slow | Very High | 4GB | High accuracy, good hardware |
large-v1 |
1550MB | Very Slow | Excellent | 8GB | Best accuracy, powerful hardware |
large-v2 |
1550MB | Very Slow | Excellent | 8GB | Latest model, best results |
For CPU-only systems:
- 4GB RAM: Use
tinyorbase - 8GB RAM: Use
smallormedium - 16GB+ RAM: Use
large-v1orlarge-v2
For GPU systems:
- Any GPU with 4GB+ VRAM: Use
large-v2(best results) - GPU with 2GB VRAM: Use
mediumorlarge-v1
The transcription model can be configured via the TRANSCRIPTION_MODEL environment variable in your .env file:
TRANSCRIPTION_MODEL=large-v1 # Options: tiny, base, small, medium, large-v1, large-v2
ClippedAI/
├── main.py # Main application script
├── requirements.txt # Python dependencies
├── README.md # This file
├── input/ # Place your videos here
│ ├── video1.mp4
│ ├── video2.mp4
│ └── *_transcription.pkl # Cached transcriptions (auto-generated)
├── output/ # Generated YouTube Shorts
│ ├── clip1.mp4
│ ├── clip2.mp4
│ └── ...
└── env/ # Virtual environment (created during setup)
All key settings can now be configured through the .env file or within main.py for subtitle styling.
-
Add your videos to the
input/foldercp /path/to/your/video.mp4 input/
-
Optional: provide a remote logo URL
If you want a logo overlay, pass it at runtime with
--logo-url. This is intended for hosted assets such as S3, Cloudflare R2, or a CDN. You can also set the position with--logo-position. -
Run the script
python main.py --logo-url "https://your-bucket.s3.amazonaws.com/brand/logo.png" --logo-position centerIf
--logo-urlis omitted, the logo overlay is skipped. -
Follow the prompts to:
- Match videos with existing transcriptions (if any)
- Choose how many clips to generate per video
- Let AI process and create your YouTube Shorts
-
Find your results in the
output/folder
The script uses Montserrat Extra Bold for subtitles (from Google Fonts). To change fonts:
-
Place your preferred font file in the
fonts/directory -
Edit the font name in
main.pyline 158:SUBTITLE_FONT = "Your-Font-Name"
-
Update the ASS style definitions in the
create_animated_subtitlesfunction to reference the new font
All key settings can now be configured through the .env file:
| Variable | Default | Description |
|---|---|---|
HUGGINGFACE_TOKEN |
your_huggingface_token_here | HuggingFace API token for speaker diarization |
LLM_PROVIDER |
groq | Metadata provider (groq or openai) |
GROQ_API_KEY |
your_groq_api_key_here | Groq API key (used when LLM_PROVIDER=groq) |
GROQ_MODEL |
llama-3.1-8b-instant | Groq model for metadata generation |
OPENAI_API_KEY |
your_openai_api_key_here | OpenAI API key (used when LLM_PROVIDER=openai) |
OPENAI_MODEL |
gpt-4o-mini | OpenAI model for metadata generation |
MIN_CLIP_DURATION |
45 | Minimum duration in seconds for YouTube Shorts |
MAX_CLIP_DURATION |
120 | Maximum duration in seconds for YouTube Shorts |
TRANSCRIPTION_MODEL |
medium | Whisper model to use (tiny, base, small, medium, large-v1, large-v2) |
ASPECT_RATIO_WIDTH |
9 | Width for aspect ratio (used with height for video resizing) |
ASPECT_RATIO_HEIGHT |
16 | Height for aspect ratio (used with width for video resizing) |
ENABLE_GPU_VIDEO_EDITING |
true | Try to use FFmpeg h264_nvenc for GPU video encoding; auto-fallback to CPU libx264 if unavailable |
LOGO_OPACITY |
0.55 | Logo transparency from 0.0 to 1.0 |
LOGO_WIDTH_RATIO |
0.50 | Logo width relative to the video width (0.50 = 540px on 1080x1920 output) |
LOGO_EDGE_MARGIN |
70 | Edge distance in pixels used for top-center and bottom-center logo positions |
ClippedAI can use your NVIDIA GPU for FFmpeg video encoding steps (trim/resize/normalize/logo overlay) through h264_nvenc.
- Set
ENABLE_GPU_VIDEO_EDITING=truein.env - Ensure your FFmpeg build includes NVENC support
- If NVENC is not available, the script falls back automatically to CPU (
libx264)
At startup, the script prints the selected encoder so you can confirm whether GPU is active.
If --logo-url is provided, ClippedAI downloads that image for the current run and overlays it at the center of every exported clip.
If --logo-url is not provided, no logo overlay is applied.
Supported --logo-position values:
-
center(orcentro) -
top-center(orcentro-alto) -
bottom-center(orcentro-basso) -
Best format: transparent PNG
-
Default position: centered on the screen
-
Default opacity:
0.55 -
Default size:
50%of the video width (recommended logo asset:540x540PNG)
Example .env configuration:
LOGO_OPACITY=0.45
LOGO_WIDTH_RATIO=0.50Example run command:
python main.py --url "https://www.youtube.com/watch?v=..." --logo-url "https://your-cdn.example.com/brands/acme/logo.png" --logo-position top-centerThe AI uses multiple factors to select the best clips:
- Word density (45% weight)
- Engagement words ratio (30% weight)
- Duration balance (25% weight)
"No module named 'clipsai'"
pip install clipsai"FFmpeg not found"
- Ensure FFmpeg is installed and in your system PATH
- Restart your terminal after installation
"CUDA out of memory"
- Use a smaller transcription model
- Close other GPU-intensive applications
- Reduce batch size if applicable
"Font not found"
- Install the required font system-wide
- Or change to a system font in the code
"API key errors"
- Verify your API keys are correct
- Check your internet connection
- Ensure you have sufficient API credits
"HuggingFace access denied"
- Make sure you've requested access to all three Pyannote repositories
- Wait a few minutes after requesting access before running the script
- Verify your HuggingFace token has "read" permissions
- Use SSD storage for faster video processing
- Close unnecessary applications to free up RAM
- Use GPU acceleration if available
- Process videos in smaller batches for large files
- Cache transcriptions to avoid re-processing if testing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license - see the LICENSE file for details.
- clipsai - Core video processing library
- Whisper - Speech recognition
- FFmpeg - Video processing
- Groq - AI title generation
- Bug Reports: GitHub Issues
- Discord: .shaarav4795.
Star this repository if you find it helpful!