Skip to content
View kavitaj11's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report kavitaj11

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kavitaj11/README.md

Developer × Researcher: Building production-grade AI Assurance

Kavita Jadhav

Trustworthy AI • LLM Testing • Full-Stack Automation

I am an AI Developer and Test Engineer (12+ years) working at the intersection of software engineering and AI assurance. My research-oriented work focuses on converting probabilistic model behavior into repeatable experiments and measurable evidence—so AI-enabled systems remain reliable under change and auditable in production.

I design and build evaluation and automation systems that move beyond “does it work?” to answer: is it safe, stable, and defensible under real-world conditions?


Start here (research + build artifacts)

LLM Assurance & Evaluation

Enterprise Test Automation Frameworks

Full-Stack Engineering (portfolio)


Research interests

My current interests focus on reliability, safety, and governance-aligned evaluation for LLM-enabled systems:

  • LLM evaluation methodology & benchmarking
    • Reproducible harness design, multi-model comparisons, metric selection, regression protocols
  • Failure mode characterization
    • Hallucination patterns, prompt sensitivity, bias/fairness gaps, context-window limits, behavioral drift
  • Adversarial testing & red teaming
    • Prompt injection, jailbreak resilience, adversarial formatting/obfuscation, privacy/PII leakage, unsafe content pathways
  • Robustness & production monitoring
    • Drift detection signals, stability under distribution shift, reliability scorecards, audit-friendly evidence trails
  • Evaluation in CI/CD
    • Automated quality gates, traceable results, compliance-aligned test suites, reproducible reporting
  • AI integration in legacy test automation frameworks
    • Adding AI-assisted capabilities to existing Selenium/BDD/API frameworks (test generation, self-healing, intelligent assertions, flaky-test reduction) while maintaining determinism, traceability, and maintainable architecture
  • AI application development & end-to-end validation
    • Building and testing AI-enabled applications (RAG, agents/tool use, MCP-based integrations), with emphasis on observability, data/privacy controls, and evaluation-driven development

What I build

  • Evaluation harnesses and test suites for accuracy, faithfulness, safety, and compliance
  • Full-stack validation across UI + API + DB integrated into CI/CD pipelines
  • Reliability artifacts: scorecards, benchmark runs, failure-mode matrices, audit-friendly reporting
  • Frameworks that prioritize testability-by-design and production realism

Domains

Banking • Healthcare Insurance (ORMB & EDI workflows) • Financial Trading • SaaS Subscription Commerce (Offer-to-Cash) • iEN Infrastructure Service Assurance


Connect


Topics

llm-testing ai-assurance red-teaming prompt-injection jailbreaks evaluation test-automation full-stack ci-cd quality-engineering trustworthy-ai

Pinned Loading

  1. K11-Software-Solutions/k11Techlab-micro-focus-uft-one-automation-testing K11-Software-Solutions/k11Techlab-micro-focus-uft-one-automation-testing Public

    TypeScript

  2. K11-Software-Solutions/k11TechLab-cucumber-bdd-java-fullstack-framework K11-Software-Solutions/k11TechLab-cucumber-bdd-java-fullstack-framework Public

    A BDD/ MBT Hybrid Framework for Intelligent Test Automation in Modern Trading Platforms

    Java 1

  3. K11-Software-Solutions/llm-testing-hub K11-Software-Solutions/llm-testing-hub Public

    A comprehensive framework for evaluating, red teaming, and assuring LLM-powered applications—combining prompt regression testing, risk audits, and enterprise AI safety tools using Promptfoo, LangTe…

    HTML 1

  4. K11-Software-Solutions/k11TechLab-selenium-java-fullstack-framework K11-Software-Solutions/k11TechLab-selenium-java-fullstack-framework Public

    Java full-stack Test Automation Framework for Enterprise Applications

    Java 1