-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview
Introduce an artifact loading and discovery system for Detektor.
This component is responsible for identifying and loading the files and artifacts that will be analyzed by the scanner.
The artifact loader provides the bridge between the scan target specified by the CLI and the rule engine that evaluates artifacts for security risks.
The system should support scanning directories and individual files, discovering relevant artifacts such as:
- agent configuration files
- workflow definitions
- prompt templates
- tool configuration files
- other AI agent-related artifacts
The artifact loading system will produce normalized in-memory representations of these artifacts for further analysis by the rule engine.
Motivation
Detektor needs a consistent way to locate and load artifacts that may contain AI agent security risks.
AI agent systems often distribute their configuration and prompts across multiple files and formats, such as:
- YAML workflow definitions
- JSON agent configuration files
- prompt templates embedded in configuration
- tool configuration blocks
Without a structured artifact loading system, the scanner would not be able to reliably locate or analyze these inputs.
A dedicated artifact loader improves:
- scanner reliability
- rule engine simplicity
- extensibility for future artifact types
- support for scanning repositories rather than single files
This component is a foundational piece of the scanning pipeline.
Proposed Approach
Implement an artifact discovery and loading component responsible for:
- Accepting the scan target path provided by the CLI.
- Traversing directories recursively when the target is a repository.
- Identifying candidate files that may contain agent configurations or workflows.
- Parsing supported file formats.
- Producing normalized artifact objects for the rule engine.
Initial format support for v0.1:
- YAML
- JSON
Example scan target:
detektor scan ./repo
Example repository structure:
repo/
agent.yaml
workflow.yaml
prompts/
review_prompt.txt
Possible internal artifact model:
Artifact
- path
- type
- content
- format
The artifact loader will provide a collection of artifacts that the rule engine can analyze for potential security issues.
Alternatives Considered
1. Scan only specific file types
Detektor could initially scan only a small set of predefined file types (for example *.yaml).
However, this approach may miss relevant artifacts that appear in other formats or naming conventions.
2. Require explicit configuration of artifacts
Another option would be requiring users to specify artifact files manually.
This would reduce automatic discovery complexity but would significantly reduce usability for CI workflows.
Risks and Trade-offs
False discovery of irrelevant files
Repository scanning may load files that are not relevant to AI agents.
This can be mitigated with simple heuristics or filtering rules.
Parser complexity
Supporting multiple file formats introduces parsing complexity.
For v0.1, limiting support to YAML and JSON keeps implementation manageable.
Repository size impact
Scanning very large repositories could introduce performance considerations.
This can be optimized in later iterations if needed.
Open Questions
- Should artifact discovery rely only on file extensions or also inspect file content?
- Should Detektor support prompt files stored as plain text in v0.1?
- Should artifact loading support ignore rules (for example
.detektorignore) in future versions?
Examples
Example CLI usage:
detektor scan .
Expected behavior:
- Detektor discovers relevant configuration files.
- Files are parsed into artifact objects.
- Artifacts are passed to the rule engine for analysis.
Next Steps
If this proposal is accepted:
- Implement artifact discovery for directories and files.
- Implement basic YAML and JSON parsers.
- Create a normalized artifact representation.
- Integrate artifact loading with the CLI scan command.
- Pass loaded artifacts to the rule engine in future issues.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status