Install dependencies:
pip install -r requirements.txtThe megamodel provides descriptions of LLM-based agents, associated tools, underlying artifacts/models, and execution traces. It is organized in four parts: core artifacts (models, metamodels, transformation models), tooling artifacts (tools, servers), agent artifacts (agents, workflows, steps), and execution traces (traces, invocations).
- Implementation:
src/core/megamodel.py - Population: The megamodel is populated at agent initialization via
populate_registry()(seescripts/run_agent_versions.py)
Domain-specific seed examples provided by experts serve as templates capturing linguistic patterns and technical requirements of real-world tool usage. Seeds are organized into single-tool seeds (single tool operations) and multi-tool seeds (tool composition patterns), further categorized by operation patterns (transformation application, information retrieval). The generation follows a two-track approach to expand seeds while maintaining quality.
The process begins with querying the megamodel repository, followed by getting available tools from MCP servers, which guide the synthetic generation.
Single-Tool instructions: MCP tools are extracted as List<MCPTool>. The system validates that company tools exist and are exposed by MCP servers. For each validated tool, tool information (<ToolName, description>) is extracted. Three seeds matching the pattern (application vs. information retrieval) are retrieved. These seeds, along with tool name, description, and generation rules, are incorporated into an LLM prompt template. The LLM generates natural language instructions paired with corresponding API calls (<Instruction, API call>). Target examples are divided equally among available tools for balanced representation. Generated instruction-API pairs are added to the SubDataset: SingleTool subdataset. Progress is saved incrementally to support resumption after interruption.
Multi-Tool instructions: Multi-tool generation operates on two-step workflows from available tools. The system decomposes company workflows and validates that each constituent tool exists and is exposed by connected MCP servers. Validated tools are classified by operation patterns (application vs. information retrieval) generating four workflow categories: application -> application, application -> info, info -> application, info -> info (List<Tools selected(2)>). Workflows are distributed so each tool appears in equal numbers. For each workflow, tool pair information (<ToolName, description>) and operation patterns are extracted. Three pattern-matching seeds are sampled from the multi-tool seed repository; if fewer than three matching seeds exist, it supplements with seeds from other patterns. These seeds, tool sequence information, and multi-step generation rules are incorporated into an LLM prompt template. The LLM generates instructions coherently connecting the two operations (<Instruction, API calls>). Duplicates are filtered during generation. Generated examples are validated for proper structure before being added to the SubDataset: MultiTool subdataset. Progress is saved incrementally with separate tracking for remainder generation.
Both subdatasets are combined and undergo final validation to produce the final dataset. Validation checks that each example contains a valid instruction string and properly formed API calls list with non-empty API names. This augmentation process expanded the dataset from 21 instruction seeds to 1000 generated instruction-API pairs.
- Location:
dataset generation/ - Scripts:
single_tool_generate.py- Single-tool instruction generationmulti_tool_generate.py- Multi-tool instruction generationpipeline.py- End-to-end generation pipeline
- Seed instructions:
dataset generation/seeds/all_tools/- ATL tool seedsuml_tools/- UML tool seedsopenrewrite/- OpenRewrite seeds
- Generated datasets:
dataset generation/outputs/atl_tools/- Containssingle_500_dataset.json,multi_500_dataset.jsonuml/- Containssingle_uml_500_dataset.json,multi_uml_500_dataset.jsonopenRewrite/- Containssingle_openRewrite_500_dataset.json,multi_openRewrite_500_dataset.json
To generate datasets:
-
Set OpenAI API key in
.envfile at project root:OPENAI_API_KEY=your_api_key_here
-
Run the generation pipeline:
cd "dataset generation" python3 pipeline.py # Runs the full generation pipeline for all tool categories
- LLM used: GPT-4o-mini for instruction generation
- Embedding model: text-embedding-3-small for dataset validation (diversity metrics)
Validates dataset diversity using six metrics from dataset augmentation research.
- Library: Uses
openaifor embeddings,scipyfor distance calculations,sklearnfor cosine similarity - Analysis script:
dataset generation/analyze_dataset_diversity.py - Metrics calculated:
- Distance (average pairwise Euclidean distance)
- Dispersion (1 - average cosine similarity)
- Isocontour Radius (geometric mean of per-dimension standard deviations)
- Affinity (similarity between seed and augmented dataset means)
- Vocabulary Size (unique words)
- Unique 3-grams (distinct 3-word sequences)
To reproduce results:
-
Generate CSV metrics for each dataset:
cd "dataset generation" python3 analyze_dataset_diversity.py # Generates CSV files in outputs/atl_tools/, outputs/uml/, outputs/openRewrite/
-
Visualize metrics as charts:
python3 visualize_metrics.py # Generates PNG charts
- Output charts:
dataset generation/outputs/- Charts are generated for each tool category (ATL, UML, OpenRewrite) showing diversity metric comparisons
Evaluates seven agent versions (representing different architectural improvements and model choices) against both the augmented dataset and the original seed dataset.
- Agent versions:
evaluation/agent_versions/(agent1.py through agent7.py) - Execution script:
scripts/run_agent_versions.py - Results:
outputs/agent_version_logs/version_1/throughversion_7/- Execution logs per agent versionreport_generation.csv- Augmented dataset resultsseeds_report_generation.csv- Seed dataset results
- Evaluation:
outputs/evaluate_accuracy.py - Visualization:
outputs/visualize_accuracy_comparison.py - Output plots:
outputs/plots/agent_accuracy_comparison.png
Tests agent performance with reduced tool availability.
- Script:
scripts/run_agent_reduced_tools.py - Analysis:
outputs/ablation_test/instruction_analysis.py - Results:
outputs/ablation_test/ - Coverage charts:
outputs/plots/coverage_chart_seeds.png
Servers expose tools via the Model Context Protocol (MCP) for agent execution.
- ATL server:
mcp_servers/atl_server/- Model transformations (includes UML transformations) - OpenRewrite server:
mcp_servers/openRewrite_servers/- Java code refactoring and migration recipes
