Skip to content

Vigintil/telegram-sticker-scraper

Repository files navigation

telegram-sticker-scraper

CLI tool to get and download Telegram sticker sets by checking words from wordlist using Telegram API.

Downloaded stickers stored in imgs/{stickerset_name} folders

Setup

  1. Create a bot via @BotFather on Telegram and get the token
  2. Clone repository
  3. Edit config.json
  4. Rename .env.example to .env and fill in your bot token and database credentials (if selected "db" storage method)
  5. Fill wordlist.txt with search words

Installation

Docker

To run the application using Docker, ensure you have configured the .env and config.json.

Start the containers with:

docker-compose up -d

This will launch the main application and the MariaDB database containers.

Manual

Requirements

  • MariaDB
  • json-c library
  • curl library

Build from source and run:

cd telegram-sticker-scraper
make
./build/sticker_scraper

Or download pre-built binaries from Releases

Configuration

Edit config.json to configure the program

Required parameters:

  • check_interval - seconds to wait between API requests
  • storage_method - data storage method ("file" or "db")

Optional parameters:

  • send links - if enabled, sends found sticker set links to Telegram
    • chat_id - Telegram chat ID where links will be sent (note: bots can't initiate conversations, send at least one message to the bot first)
  • log - if enabled logs program output into log file

Storage methods

Choose storage method by setting storage_method in config.json:

File

Sticker set data is saved to sticker_sets.json in format:

{
  "contains_masks": false,
  "size": 1,
  "title": "STICKER SET TITLE",
  "word": "STICKER SET SHORT NAME",
  "link": "https:\\/\\/telegram.me\\/addstickers\\/stickerset_name",
  "stickers": [
    {
      "is_video": false,
      "type": 2,
      "emoji": "💟",
      "file_id": "CAACAgQAAxUAAWlK0GeZC4WAa8o4OMbZa0wWdmjiAAJlCgACyh7pUsq-hBZYPpYlNgQ",
      "file_name": "0.webp"
    }
  ]
},
{...},
...

Checked words are saved to wordlist.txt

Database

The program connects to MariaDB using credentials from .env, creates required tables, and saves sticker set data with checked words.

Tables

  • sticker_sets - Sticker set data
  • stickers - Individual sticker data
  • not_found_words - List of unsuccessfully checked words

Filter Script

A script is included in scripts/wordlist to clean wordlist. It removes:

  • Duplicate words
  • Words that are too short (< 5 characters)
  • Words that are too long (> 62 characters)

About

CLI program for scrapping Telegram sticker sets

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages