Skip to content

ML Intern backlog prioritization report - 2026-05-04 #223

@lewtun

Description

@lewtun

ML Intern Backlog Prioritization

Generated: 2026-05-04T13:18:56.926357+00:00
Model: anthropic/claude-opus-4-6

Sources: github_issue=24, github_pr=50, hf_discussion=14

Summary

Critical security PRs (#96, #85, #83) and agent-quality fix (#88) are reviewed and ready to merge. Hosted Space has multiple reliability regressions (session zombies, Pro detection, onboarding CSS, blank pages) blocking paying users. Provider expansion (local models, Bedrock, Azure, Gemini) is the top feature theme with 6+ PRs. Cost guardrails and session durability are strategic priorities.

Can Be Closed

No high-confidence resolved-in-main candidates found.

Highest Impact Next

  1. Merge 3 security PRs: CVE patch, sandbox auth, SSRF fix (impact 5/5, effort 1/5, confidence 0.95)

  2. Merge thinking_blocks fix (PR fix: thread thinking_blocks through history so extended thinking survives tool turns #88) (impact 5/5, effort 1/5, confidence 0.95)

  3. Fix onboarding CSS overflow blocking Start button (impact 5/5, effort 1/5, confidence 0.95)

    • Recommendation: Change onboarding container overflow:hidden to overflow-y:auto — root cause fully documented with screenshots and confirmed fix
    • Rationale: Completely blocks first-time users on 1366x768 screens; users must open DevTools to use the product
    • Next action: Push 1-line CSS fix to main
    • Sources: hf_discussion#19
  4. Fix MCP tool stall (hub_repo_details hangs 5+ min) (impact 5/5, effort 2/5, confidence 0.9)

  5. Fix session zombie state and quota cleanup (impact 5/5, effort 3/5, confidence 0.85)

  6. Fix Pro plan detection and entitlement checks (impact 4/5, effort 2/5, confidence 0.85)

  7. Merge Bedrock region prefix fix (PR fix(research): inherit main model's Bedrock region prefix in sub-agent #185) (impact 4/5, effort 1/5, confidence 0.9)

  8. Add LICENSE file to unblock adoption (impact 4/5, effort 1/5, confidence 0.9)

    • Recommendation: Add Apache 2.0 LICENSE file — 5 thumbs-up, enterprise users explicitly blocked from testing
    • Rationale: No license = legal blocker for enterprises and external contributors; 5 reactions + multiple +1 comments
    • Next action: Add LICENSE file and README badge; use PR License and Citation #178 as reference
    • Sources: github_issue#41, github_pr#178

Features

  1. Local model support (Ollama/vLLM/OpenAI-compat) (impact 5/5, effort 3/5, confidence 0.75)

  2. Provider adapter refactor + Bedrock/Azure/Gemini (impact 4/5, effort 3/5, confidence 0.75)

  3. Image, file, and dataset attachments for web+CLI (impact 4/5, effort 4/5, confidence 0.6)

  4. OpenRouter / custom OpenAI-compatible base URL (impact 4/5, effort 1/5, confidence 0.8)

    • Recommendation: Implement OPENAI_BASE_URL passthrough — minimal change, follows ecosystem convention, unlocks 300+ models
    • Next action: Accept a small PR respecting OPENAI_BASE_URL env var
    • Sources: github_issue#197, github_pr#188
  5. Cost guardrails: prompt caching, iteration cap, research concurrency (impact 5/5, effort 4/5, confidence 0.85)

    • Recommendation: Sprint on P0 items: add cache_control markers, lower max_iterations 300→40, cap concurrent research subagents, default cheaper research model
    • Next action: Internal sprint; start with max_iterations cap and prompt caching
    • Sources: github_issue#61
  6. Expand pre-flight checks for hf_jobs approval (impact 4/5, effort 2/5, confidence 0.7)

    • Recommendation: Add 4 static checks (timeout, hub_model_id, flash-attn, trackio) + credit pre-check to prevent wasted GPU spend
    • Next action: Accept PR for reliability_checks.py expansion; add test coverage
    • Sources: github_issue#203, github_issue#125
  7. Notify user when agent needs input/approval (impact 3/5, effort 1/5, confidence 0.7)

  8. Evaluation and benchmarking CLI (impact 4/5, effort 4/5, confidence 0.5)

  9. Background sessions on Mongo control plane (impact 5/5, effort 5/5, confidence 0.7)

    • Recommendation: Fix P0 blockers (_enforce_gated_model_quota crash, reconnect 404) before merge; PR has 83 tests and thorough design
    • Next action: Author to fix remaining P0s; re-trigger automated review
    • Sources: github_pr#206
  10. Claude Code project-mode / plugin support (impact 4/5, effort 4/5, confidence 0.5)

  1. Opt-in LangFuse observability callback (impact 3/5, effort 2/5, confidence 0.65)
  1. Frontend UX: example prompts, sidebar, copy/regenerate (impact 3/5, effort 3/5, confidence 0.8)

Fixes

  1. CVE-2026-27962 authlib upgrade (impact 5/5, effort 1/5, confidence 0.9)

    • Recommendation: Fix title mismatch (says 1.6.9, is 1.7.0), vet joserfc dep, merge urgently
    • Sources: github_pr#96
  2. Sandbox API unauthenticated RCE (impact 5/5, effort 1/5, confidence 0.95)

    • Recommendation: Rebase and merge — reviewed APPROVE, adds bearer-token auth to all sandbox endpoints
    • Sources: github_pr#85
  3. SSRF in fetch_hf_docs leaks HF token (impact 5/5, effort 1/5, confidence 0.95)

    • Recommendation: Merge — reviewed safe-to-merge, origin validation with good test coverage
    • Sources: github_pr#83
  4. Extended thinking drops after first tool call (impact 5/5, effort 1/5, confidence 0.95)

    • Recommendation: Merge — 51 lines, 1 file, 'SAFE TO MERGE, LOW RISK'
    • Sources: github_pr#88
  5. Start session button hidden by CSS overflow (impact 5/5, effort 1/5, confidence 0.95)

    • Recommendation: 1-line CSS fix: overflow:hidden → overflow-y:auto on onboarding container
    • Sources: hf_discussion#19
  6. Concurrent plan overwrites + CORS rejection on Spaces (impact 4/5, effort 1/5, confidence 0.8)

  7. Read tool crashes on string offset/limit (impact 4/5, effort 1/5, confidence 0.9)

    • Recommendation: Merge after also fixing bash_handler timeout (same bug); reviewed 'ready to merge'
    • Sources: github_pr#110
  8. HF Pro subscription not detected after upgrade (impact 4/5, effort 2/5, confidence 0.85)

  9. Stopped sessions don't free quota; catch-up stuck (impact 5/5, effort 3/5, confidence 0.85)

  10. Chat messages spontaneously disappear mid-session (impact 4/5, effort 3/5, confidence 0.65)

  1. Dotenv ignored when shell has stale env var (impact 3/5, effort 1/5, confidence 0.8)
  • Recommendation: Confirm maintainer intent (ambiguous close comment); merge if acceptable — reviewed LGTM
  • Sources: github_pr#120
  1. 401 Client Error on sandbox creation (impact 4/5, effort 3/5, confidence 0.55)

Other / Watchlist

  1. Add LICENSE file (Apache 2.0) (impact 4/5, effort 1/5)

  2. Add CONTRIBUTING.md (impact 2/5, effort 1/5)

  3. Update README model example + env var annotations (impact 2/5, effort 1/5)

  4. Fix CI review workflow for fork PRs (impact 3/5, effort 1/5)

    • Recommendation: Rebase and merge — blocks automated review for all external contributors
    • Sources: github_pr#107
  5. Restructure system prompt tone (impact 3/5, effort 2/5)

    • Recommendation: Re-add 3 dropped behavioral rules flagged in review before merging
    • Sources: github_pr#33
  6. Use huggingface_hub.get_token() as fallback (impact 3/5, effort 1/5)

Clusters

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions