Chat with an AI clone of Alisher Sadullaev — grounded in his public Telegram posts, interviews, talks, official pages, and other public-source material.
ask-alisher is a retrieval-augmented chat app built from the same architecture as ask-akmal, but adapted for Alisher Sadullaev.
The bot is designed to answer questions in Alisher's public voice using only retrieved context from:
- the public
@alisher_sadullaevTelegram archive - public YouTube interviews and talks
- official public-source briefs from
gov.uzand other government / agency pages - short public bio/profile material
It is best suited for questions about youth development, education, entrepreneurship, volunteering, regional initiatives, reading culture, and chess.
User question -> Gemini embedding -> pgvector similarity search -> date-aware + source-aware retrieval -> Gemini 2.5 Flash -> streamed response with source cards
- Public-source documents are chunked and embedded into
documents_alisherin Supabase with richer metadata likesource_domain,source_authority,domain_tags,published_at, andis_official. - The API route retrieves semantically similar chunks, with extra Telegram handling for recent/date-specific prompts and extra boosting for official/public sources when the question is about programs, policy, regions, or institutional roles.
- Gemini 2.5 Flash generates the answer using only retrieved context.
- The client streams the answer and renders source cards with snippets, topic labels, and more human source titles separately.
- Low-signal Telegram export artifacts are filtered out during ingestion so the app prefers substantive posts over pinned-photo or pinned-voice placeholders.
- Durable rate limiting now runs through Supabase RPC instead of per-instance memory.
- Next.js 16
- Vercel AI SDK v6
- Gemini 2.5 Flash
- Gemini
gemini-embedding-001embeddings - Supabase + pgvector
- Tailwind CSS v4
- TypeScript
git clone <your-new-repo-url>
cd ask-alisher
npm installCreate .env.local:
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_api_key
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
NEXT_PUBLIC_GTM_ID=GTM-N3M3DLLG
NEXT_PUBLIC_GA_MEASUREMENT_IDS=G-BWTQB4SFP4,G-2XNF6BSJG8
NEXT_PUBLIC_TURNSTILE_SITE_KEY=your_turnstile_site_key
TURNSTILE_SECRET_KEY=your_turnstile_secret_key
ANALYTICS_DASHBOARD_KEY=your_private_dashboard_key
SITE_URL=https://askalishersadullaev.netlify.app
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
TELEGRAM_WEBHOOK_SECRET=your_telegram_webhook_secret
INTERNAL_API_SECRET=your_internal_api_secretImportant:
- This repo is already isolated to the
documents_alishertable andmatch_alisher_documents()RPC. - It can share the same Supabase project as
ask-akmalwithout overwritingdocuments. scripts/chunk-and-embed.tsrebuilds only thedocuments_alishercorpus.scripts/backfill-topic-metadata.tscan enrich the existing corpus withtopics,is_first_person, andis_low_signalmetadata without rebuilding embeddings.- The current corpus now includes Telegram, YouTube, bio material, and versioned official public-source briefs under
data/official/. - Official/public chunks now carry richer metadata like
source_domain,source_authority,domain_tags,published_at, andis_officialso retrieval can prefer them more intelligently. - GTM is wired through
NEXT_PUBLIC_GTM_ID, defaulting toGTM-N3M3DLLGfor the Alisher site. - GA4 is wired through
NEXT_PUBLIC_GA_MEASUREMENT_IDS, defaulting toG-BWTQB4SFP4,G-2XNF6BSJG8. - Soft abuse protection is wired for Cloudflare Turnstile. Set
NEXT_PUBLIC_TURNSTILE_SITE_KEYandTURNSTILE_SECRET_KEYto enable it on the web chat route. - The app now writes a small first-party analytics stream into Supabase for local reporting scripts.
- The protected first-party analytics dashboard reads
ANALYTICS_DASHBOARD_KEYfrom the server environment. - The dashboard now includes sparkline KPI cards, anomaly flags, trend charts, conversion flow, prompt splits by language, a traffic health strip, top prompts, CSV export, citation-click analytics, a prompt explorer, a knowledge-base freshness panel, and a recent-events stream at
/admin/analytics. - Telegram bot webhook traffic is handled at
/api/telegram/webhookand reuses the same Ask Alisher answer engine. TELEGRAM_WEBHOOK_SECRETprotects the webhook endpoint, andINTERNAL_API_SECRETis used only for the internal bot-to-chat API call.
Push the included Supabase migrations:
npx supabase db pushFetch the Telegram archive snapshot:
npm run sync:telegramFetch the full public Telegram history:
npm run sync:telegram:allDownload the curated public YouTube manifest:
python3 scripts/download-remaining-yt.pyThe repo also includes curated official public-source briefs under data/official/. They are included automatically in the full rebuild command below.
Ingest only the updated YouTube transcripts without rebuilding the whole corpus:
source <(grep -v '^#' .env.local | grep '=' | sed 's/^/export /') && npx tsx scripts/chunk-and-embed.ts --prefix=youtube/ --skip-clearRemove already-ingested low-signal Telegram rows from Supabase:
source <(grep -v '^#' .env.local | grep '=' | sed 's/^/export /') && npm run prune:low-signalOr rebuild everything under data/:
source <(grep -v '^#' .env.local | grep '=' | sed 's/^/export /') && npx tsx scripts/chunk-and-embed.tsBackfill topic and first-person metadata onto the existing Supabase corpus:
source <(grep -v '^#' .env.local | grep '=' | sed 's/^/export /') && npx tsx scripts/backfill-topic-metadata.tsRun the starter regression eval set against a local or deployed app:
npm run evals:core -- --base-url=http://localhost:3000Or against production:
EVAL_BASE_URL=https://askalishersadullaev.netlify.app npm run evals:corePrint a first-party analytics summary from Supabase:
npm run analytics:summary -- --days=7Open the protected analytics dashboard:
/admin/analytics?key=YOUR_ANALYTICS_DASHBOARD_KEY
Configure the Telegram bot webhook, descriptions, commands, menu button, and profile photo:
npm run telegram:setupnpm run devOpen http://localhost:3000.
- Telegram sync defaults are already pointed at
@alisher_sadullaev. - The UI, prompt pack, and metadata are adapted for Alisher Sadullaev.
- A small public bio seed file is included in
data/. - The YouTube downloader now reads from
scripts/alisher-video-manifest.json. - The YouTube manifest now includes extra long-form youth policy, girls' education, collaboration, and interview coverage.
- The repo now includes versioned official public-source briefs in
data/official/, including Youth Affairs Agency and othergov.uzmaterial. - Retrieval now uses richer domain-aware metadata so official/public sources can outrank Telegram when the question is really about policy, programs, regional work, or institutional context.
- Source cards now show a supporting snippet plus inferred topic tags.
- Telegram ingestion now skips low-signal export artifacts, and
npm run prune:low-signalremoves any already stored leftovers. - Durable rate limiting uses the
consume_ask_alisher_rate_limit()Supabase RPC. - First-party analytics events are stored in
ask_alisher_analytics_eventsfor local reporting. - A protected
/admin/analyticsdashboard is available on top of the same first-party analytics table. - Telegram bot conversation turns are also stored in
ask_alisher_analytics_eventsso the bot can keep short per-chat context. - The repo includes a starter regression suite in
evals/alisher-core.json. - The current roadmap is tracked in
docs/roadmap.md.
src/
app/
components/
lib/
scripts/
alisher-video-manifest.json
analytics-report.ts
backfill-topic-metadata.ts
chunk-and-embed.ts
fetch-telegram-channel.ts
import-telegram-posts.ts
prune-low-signal.ts
run-evals.ts
setup-telegram-bot.ts
sync-telegram.ts
download-remaining-yt.py
docs/
roadmap.md
evals/
alisher-core.json
data/
bio_alisher_sadullaev.txt
official/
telegram_posts/
youtube/
supabase/
schema.sql
migrations/
MIT