English | 简体中文
SkillForge is a repository distillation workbench that turns messy project materials into reusable AI skill packages.
It scans local folders, parses mixed document formats, extracts workflow evidence, clusters that evidence into capabilities, and compiles the result into previewable and exportable skill outputs through a built-in web UI.
Teams already have the raw knowledge they need: product docs, design notes, research reports, runbooks, SOPs, meeting notes, API references, and random markdown files scattered across repositories and shared folders.
The hard part is turning that fragmented, implicit knowledge into something reusable by an AI agent.
Doing that manually is slow and brittle:
- one-off scripts are fast to start but hard to maintain
- generic agent frameworks help orchestration, but they do not automatically distill repository knowledge into skill packages
- manual copy/paste into prompts does not scale across teams or projects
SkillForge focuses on that missing step:
evidence extraction → capability clustering → skill compilation
So instead of just "running an agent," you can systematically convert repository knowledge into reusable skill artifacts.
Ad-hoc scripts are useful for a single extraction pass, but they usually stop at file parsing or keyword search.
SkillForge gives you:
- a repeatable end-to-end pipeline
- structured job tracking
- a visual workflow dashboard
- preview and export for generated skill outputs
Those tools are great for orchestration, prompting, and model workflows. SkillForge solves a narrower but very practical problem: turning a real-world repository into reusable skill packages with a local-first review workflow.
In short:
- LangChain / DSPy help you build agent systems
- SkillForge helps you distill repository knowledge into agent-usable skills
They are complementary, not mutually exclusive.
- Distill internal documentation into reusable team skills
- Convert research folders into structured analyst workflows
- Extract SOPs and process knowledge from operations repositories
- Turn API docs and implementation notes into integration-oriented skills
- Build a reviewable bridge between messy source materials and AI-ready skill packages
SkillForge is designed as a visual, local-first workflow.
- Create a job — point SkillForge at a repository or document corpus
- Scan and parse — discover candidate files and parse supported formats
- Extract evidence — surface workflow-relevant excerpts and task signals
- Cluster capabilities — group evidence into reusable capability areas
- Compile skills — generate structured skill outputs for review
- Export to folder — write the generated skill set to disk
The web UI is not just a thin shell over APIs. It provides:
- a dashboard with recent jobs and pipeline coverage
- a new job form for repository selection and goal definition
- a job detail page with:
- stage timeline
- progress tracking
- parsed documents view
- evidence workbench
- capability cluster preview
- skill plan preview
- generated skill preview
- export and overwrite review flow
- live status updates via Server-Sent Events
Click on the model API in the upper right corner to directly enter the API configuration page.
- Python 3.11+
- Windows, macOS, or Linux
From the repository root:
pip install -r requirements.txtOr install the backend package directly:
cd backend
pip install -e .From the repository root:
python start_skillforge.pyThen open:
http://127.0.0.1:8000
The launcher will:
- verify that
backend/exists - create
backend/.envfrombackend/.env.exampleif needed - start the FastAPI app with auto reload
- Open
http://127.0.0.1:8000/jobs/new - Enter a job name
- Choose a local repository or document corpus
- Describe the goal you want SkillForge to extract
- Run the pipeline
- Review the generated evidence, capabilities, plans, and skills
- Export the generated skills to a local folder
A typical generated output looks like this:
exports/
└── customer-onboarding-skill/
├── SKILL.md
├── references/
│ ├── decision-table.md
│ ├── examples.md
│ └── source-map.md
├── scripts/
│ └── analyze_inputs.py
└── assets/
└── template.txt
It is normal for the skills generation to take a relatively long time due to the large number of files.
This makes the result reviewable, portable, and easy to iterate on.
In general, it is highly recommended to configure an external large model API to experience the full functionality of the agent. Current default behavior is local-first:
- repository scanning works without external models
- document parsing works without external models
- evidence extraction, clustering, planning, and compilation currently have heuristic/local implementations
- the web UI and export flow work without an OpenAI key or any other hosted model
SkillForge also includes an optional model API configuration UI for OpenAI-compatible or related providers. That configuration is useful for connection testing and future/extended model-backed workflows, but it is not required for the default local experience.
Today, the built-in configuration supports provider-style settings for:
- OpenAI-compatible APIs
- Azure OpenAI-style endpoints
- Anthropic-compatible endpoints
- custom compatible endpoints
If you want to write a complete set of skills, you still need to configure an external large model api。
- Python 3.11+
- FastAPI for API and server-side app delivery
- Jinja2 for server-rendered UI
- Uvicorn as the ASGI server
- Vanilla JavaScript for client-side interactions
- Server-Sent Events (SSE) for live job updates
- CSS in
backend/app/static/style.css
- Pydantic v2 for schemas and validation
- pydantic-settings for environment-based configuration
- orjson for JSON handling
- python-multipart for form processing
- Local-first default mode with database persistence disabled
- SQLite file in
backend/data/skillforge.dbfor local app data storage - SQLAlchemy 2 for ORM/repository integration
- Alembic for migrations
- PostgreSQL + psycopg as optional relational persistence
- Celery for optional background execution
- Redis as optional broker/result backend
- python-docx for
.docx - pypdf for
.pdf - openpyxl for
.xlsx - native Python handling for
.mdand.txt
- configurable provider/base URL/API key/model settings
- connection testing from the settings page
- SSL, timeout, token, sampling, and streaming controls
Core modules live under backend/app/:
main.py— app bootstrap and router registrationweb.py— server-rendered pages and form routesapi/routes/— JSON API endpointsservices/jobs.py— job orchestrationservices/inventory.py— repository scanning and candidate discoveryservices/parsing.py— document parsingservices/extraction.py— evidence extractionservices/distillation.py— capability clustering and skill planningservices/compiler.py— skill compilation and validationservices/exporter.py— export flow and overwrite reviewservices/model_client.py— optional external model connectivitytasks/— Celery integrationtemplates/— dashboard, job form, job detail, settings pages
.
├── backend/
│ ├── app/
│ │ ├── api/routes/
│ │ ├── core/
│ │ ├── db/
│ │ ├── schemas/
│ │ ├── services/
│ │ ├── static/
│ │ ├── tasks/
│ │ ├── templates/
│ │ ├── main.py
│ │ └── web.py
│ ├── alembic/
│ ├── data/
│ ├── exports/
│ ├── .env.example
│ ├── alembic.ini
│ └── pyproject.toml
├── requirements.txt
├── start_skillforge.py
├── start_skillforge.bat
└── README.md
GET /health— health and runtime mode
GET /api/jobsPOST /api/jobsGET /api/jobs/{job_id}GET /api/jobs/{job_id}/statusGET /api/jobs/{job_id}/eventsPOST /api/jobs/{job_id}/runPOST /api/jobs/{job_id}/dispatchPOST /api/jobs/{job_id}/retry
GET /api/settings/model-apiPOST /api/settings/model-apiPOST /api/settings/model-api/test
- dashboard at
http://127.0.0.1:8000/ - new job page at
http://127.0.0.1:8000/jobs/new - job detail page at
http://127.0.0.1:8000/jobs/{job_id}
Environment variables are loaded from backend/.env with the SKILLFORGE_ prefix.
Example defaults:
SKILLFORGE_CORS_ORIGINS=["http://localhost:3000"]
SKILLFORGE_DATABASE_URL=postgresql+psycopg://skillforge:skillforge@localhost:5432/skillforge
SKILLFORGE_REDIS_URL=redis://localhost:6379/0
SKILLFORGE_USE_ASYNC_PIPELINE=false
SKILLFORGE_USE_DATABASE_PERSISTENCE=falseImportant runtime flags include:
SKILLFORGE_USE_ASYNC_PIPELINESKILLFORGE_USE_DATABASE_PERSISTENCESKILLFORGE_DATABASE_URLSKILLFORGE_REDIS_URL- model API related fields such as base URL, API key, model name, and timeout
There is currently no first-party automated test suite checked into this repository.
At the moment, validation is mainly manual:
- start the app locally
- create a job from the web UI
- run the pipeline
- verify stage progress, preview output, and export behavior
If you add tests later, pytest would be a natural choice, but it is not yet wired up in this repository.
Contributions are welcome.
A lightweight contributor workflow for now:
- fork the repository
- create a feature branch
- make focused changes
- verify the UI/API flow locally
- open a pull request with screenshots or reproduction notes when relevant
- extraction and clustering are still heuristic/local-first rather than fully model-driven
- scanned-image OCR is not included, so image-only PDFs are not fully supported
- very large repositories may require narrowing scope for practical review
- no committed automated test suite yet
- output quality still depends on source material quality and structure
- stronger model-backed extraction and clustering options
- richer preview and traceability UX
- better large-repository scaling and filtering
- automated test coverage
- more polished export and packaging workflows
This project is licensed under the Apache License 2.0. See LICENSE.