Skip to content

dbkarashev/diepoor

Repository files navigation

diepoor

Personal job aggregator. Polls hh.ru / hh.kz, career.habr.com and public Telegram channels for QA Automation, Manual QA and ML/Data Analyst openings. Deduplicates, writes results to a Google Sheet and sends Telegram notifications with an inline "hide" button.

Layout

diepoor/
├── diepoor/                      source package
│   ├── sources/                  hh, habr_career, getmatch, telegram_channels
│   └── sinks/                    telegram_bot, google_sheets
├── scripts/                      seed_dedup, prepare_telethon_session, cleanup_sheets
├── cloudflare-worker/            inline-button handler (JS)
├── .github/workflows/            run.yml (poll), cleanup.yml (nightly)
├── config.yaml                   profiles, queries, filters
└── channels.txt                  Telegram channels list

Quick start (local)

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env                 # fill in tokens
./run.sh --dry-run                   # validate sources without writing
./run.sh                             # real run

GitHub Actions

  1. Fork / clone.
  2. Add repository secrets (Settings → Secrets and variables → Actions):
Secret Source
TELEGRAM_BOT_TOKEN @BotFather/newbot
TELEGRAM_CHAT_ID Your numeric chat id (see Telegram setup below)
GOOGLE_SHEET_URL Full URL of the target spreadsheet
GOOGLE_CREDENTIALS_JSON Entire JSON of a Google service account
TELEGRAM_API_ID / TELEGRAM_API_HASH / TELEGRAM_PHONE Optional, for Telegram channels source
TELETHON_SESSION_B64 Optional, see scripts/prepare_telethon_session.py
  1. Actions → diepoor → Run workflow. Polling runs every 15 minutes; cleanup runs nightly at 22:00 UTC.

Google Sheets setup

  1. Create a project at console.cloud.google.com.
  2. Enable Google Sheets API and Google Drive API.
  3. IAM → Service Accounts → Create service account → Keys → Add key → JSON.
  4. Save the file as credentials/gcp-service-account.json.
  5. Share the spreadsheet with the client_email from the JSON as Editor.

Telegram bot setup

  1. @BotFather/newbot → store the token.
  2. Open the bot chat and press Start (otherwise the bot cannot send you messages).
  3. https://api.telegram.org/bot<TOKEN>/getUpdates — find your chat.id in the response.

Inline "hide" button

Each notification has a 🗑 button. Handling it requires a small webhook — the Cloudflare Worker in cloudflare-worker/. Deploy:

cd cloudflare-worker
npx wrangler login
npx wrangler secret put TELEGRAM_BOT_TOKEN
npx wrangler secret put ALLOWED_CHAT_ID
npx wrangler deploy

Register the webhook with Telegram:

curl "https://api.telegram.org/bot<TOKEN>/setWebhook?url=<WORKER_URL>"

Without the webhook the button is visible but inert.

Configuration

Profiles and filters live in config.yaml:

profiles:
  - name: qa_auto_junior
    queries: ["QA automation junior", "автотестировщик junior"]
    experience: [noExperience, between1And3]
    keywords_must_any: [automation, selenium, playwright, pytest, sdet]
    keywords_exclude: [senior, lead, manager]
Source Notes
hh HTML scraping of hh.ru/search/vacancy via data-qa selectors. Multiple hosts via area codes (113 RU, 40 KZ).
habr_career HTML scraping of career.habr.com/vacancies. Extracts grade and work mode from the meta block.
getmatch Disabled — listings are rendered only for authenticated users.
telegram_channels Telethon reader, channel list in channels.txt.

Sheet layout

One worksheet Вакансии with columns: Дата найдена, Источник, Профиль, Компания, Должность, Локация, Формат, Зарплата, Опыт, Ссылка, Дедлайн, Статус, Заметки.

Two columns — Статус and Заметки — are yours to fill in (applied, interview, rejected, etc.).

Nightly cleanup:

  • rows older than 7 days move to the Архив sheet
  • rows in Архив older than 30 days with empty Статус are dropped

CLI

python main.py [--dry-run] [--verbose]
python scripts/seed_dedup.py                # seed dedup DB from existing Sheet
python scripts/cleanup_sheets.py            # archive + prune
python scripts/prepare_telethon_session.py  # print base64 Telethon session

launchd (macOS alternative)

If you prefer running on your own machine instead of GitHub Actions:

cp launchd/com.dbkarashev.diepoor.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.dbkarashev.diepoor.plist

License

This project is licensed under the MIT License.

About

Personal job aggregator

Topics

Resources

License

Stars

Watchers

Forks

Contributors