Skip to content
View TheSkyBiz's full-sized avatar
😌
paso corto, vista larga
😌
paso corto, vista larga

Block or report TheSkyBiz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
TheSkyBiz/README.md

Aakash Biswas

ML Systems Engineering | Adaptive LLM Routing | Model Reliability

I work on adaptive AI systems that study confidence estimation, calibration, and cost–accuracy tradeoffs between small and large language models.


Research Interests

  • Adaptive routing between small and large language models
  • Confidence estimation and uncertainty-aware inference
  • Calibration and overconfidence analysis in transformers
  • Failure mode analysis in NLP systems
  • Efficient inference under resource constraints

Selected Work

Adaptive LLM Router

A task-adaptive routing framework that dynamically escalates queries between small and large models using uncertainty heuristics and systematic cost–accuracy benchmarking.

Confidence–Correctness Analyzer

An empirical reliability study measuring calibration error (ECE), overconfidence rates, and temperature scaling effects across transformer models.

Early-Warning Financial Stress System

An interpretable classical ML pipeline for time-aware financial risk prediction with lead-time evaluation.

More details in pinned repositories.


Technical Stack

Core ML — PyTorch · HuggingFace · scikit-learn
LLM Systems — Ollama · Transformers · Prompt Engineering
Evaluation — Calibration Metrics · Reliability Analysis
Data & Infra — Pandas · NumPy · SQLite · FAISS
Applications — Streamlit · FastAPI


Beyond Research

Outside machine learning, I follow Formula 1, watch football, play competitive and narrative-driven games, unwind with music, and occasionally revisit anime.


Connect

LinkedIn: https://linkedin.com/in/theskybiz141
Email: aakashjsrindia@gmail.com


Precision matters — whether in model calibration or a last-lap overtake.

Pinned Loading

  1. llm-persona-drift-evaluation llm-persona-drift-evaluation Public

    945-generation adversarial evaluation of 3 open LLMs across 3 personas and 20 attack types, measuring semantic drift, override rates, and distributional instability.

    Python

  2. confidence-correctness-mismatch-analyser confidence-correctness-mismatch-analyser Public

    Behavioral reliability study of extractive QA models on 500 SQuAD samples. Compares DistilBERT (70% accuracy) and RoBERTa (92.6% accuracy), analyzing weighted ECE, overconfidence (0.112 vs 0.016), …

    Jupyter Notebook

  3. instruction-fine-tuning-sarvam-2B instruction-fine-tuning-sarvam-2B Public

    This project explores instruction fine-tuning of the Sarvam AI 2B model using LoRA to improve response efficiency without full retraining. Only ~0.13% of parameters are trained on a small synthetic…

    Python

  4. trust-agent trust-agent Public

    TrustAgent is a lightweight LLM reliability pipeline that pairs a fast generator with a stronger critic to evaluate responses before they are trusted. It uses asymmetric model architecture, structu…

    Python

  5. genai-multi-agent-experiments genai-multi-agent-experiments Public

    Hands-on experiments with multi-agent systems powered by Gemini API and Generative AI.

    Jupyter Notebook

  6. yt-video-summariser yt-video-summariser Public

    YouTube Video Summarizer is a Streamlit web app that extracts transcripts from YouTube videos, generates concise and detailed AI summaries, and answers user questions using advanced semantic search…

    Python 1