Skip to content

Artifact loading and discovery system for scan targets #2

@meisterware-admin

Description

@meisterware-admin

Overview

Introduce an artifact loading and discovery system for Detektor.

This component is responsible for identifying and loading the files and artifacts that will be analyzed by the scanner.

The artifact loader provides the bridge between the scan target specified by the CLI and the rule engine that evaluates artifacts for security risks.

The system should support scanning directories and individual files, discovering relevant artifacts such as:

  • agent configuration files
  • workflow definitions
  • prompt templates
  • tool configuration files
  • other AI agent-related artifacts

The artifact loading system will produce normalized in-memory representations of these artifacts for further analysis by the rule engine.


Motivation

Detektor needs a consistent way to locate and load artifacts that may contain AI agent security risks.

AI agent systems often distribute their configuration and prompts across multiple files and formats, such as:

  • YAML workflow definitions
  • JSON agent configuration files
  • prompt templates embedded in configuration
  • tool configuration blocks

Without a structured artifact loading system, the scanner would not be able to reliably locate or analyze these inputs.

A dedicated artifact loader improves:

  • scanner reliability
  • rule engine simplicity
  • extensibility for future artifact types
  • support for scanning repositories rather than single files

This component is a foundational piece of the scanning pipeline.


Proposed Approach

Implement an artifact discovery and loading component responsible for:

  1. Accepting the scan target path provided by the CLI.
  2. Traversing directories recursively when the target is a repository.
  3. Identifying candidate files that may contain agent configurations or workflows.
  4. Parsing supported file formats.
  5. Producing normalized artifact objects for the rule engine.

Initial format support for v0.1:

  • YAML
  • JSON

Example scan target:

detektor scan ./repo

Example repository structure:

repo/
  agent.yaml
  workflow.yaml
  prompts/
    review_prompt.txt

Possible internal artifact model:

Artifact
- path
- type
- content
- format

The artifact loader will provide a collection of artifacts that the rule engine can analyze for potential security issues.


Alternatives Considered

1. Scan only specific file types

Detektor could initially scan only a small set of predefined file types (for example *.yaml).

However, this approach may miss relevant artifacts that appear in other formats or naming conventions.

2. Require explicit configuration of artifacts

Another option would be requiring users to specify artifact files manually.

This would reduce automatic discovery complexity but would significantly reduce usability for CI workflows.


Risks and Trade-offs

False discovery of irrelevant files

Repository scanning may load files that are not relevant to AI agents.

This can be mitigated with simple heuristics or filtering rules.

Parser complexity

Supporting multiple file formats introduces parsing complexity.

For v0.1, limiting support to YAML and JSON keeps implementation manageable.

Repository size impact

Scanning very large repositories could introduce performance considerations.

This can be optimized in later iterations if needed.


Open Questions

  • Should artifact discovery rely only on file extensions or also inspect file content?
  • Should Detektor support prompt files stored as plain text in v0.1?
  • Should artifact loading support ignore rules (for example .detektorignore) in future versions?

Examples

Example CLI usage:

detektor scan .

Expected behavior:

  • Detektor discovers relevant configuration files.
  • Files are parsed into artifact objects.
  • Artifacts are passed to the rule engine for analysis.

Next Steps

If this proposal is accepted:

  1. Implement artifact discovery for directories and files.
  2. Implement basic YAML and JSON parsers.
  3. Create a normalized artifact representation.
  4. Integrate artifact loading with the CLI scan command.
  5. Pass loaded artifacts to the rule engine in future issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:scannerCore repository scanning engine.featureNew functionality or capabilitypriority:highImportant issue affecting core functionality.

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions