Artifact loading and discovery system for scan targets

## Overview

Introduce an artifact loading and discovery system for Detektor.

This component is responsible for identifying and loading the files and artifacts that will be analyzed by the scanner.

The artifact loader provides the bridge between the **scan target specified by the CLI** and the **rule engine that evaluates artifacts for security risks**.

The system should support scanning directories and individual files, discovering relevant artifacts such as:

* agent configuration files
* workflow definitions
* prompt templates
* tool configuration files
* other AI agent-related artifacts

The artifact loading system will produce normalized in-memory representations of these artifacts for further analysis by the rule engine.

---

## Motivation

Detektor needs a consistent way to locate and load artifacts that may contain AI agent security risks.

AI agent systems often distribute their configuration and prompts across multiple files and formats, such as:

* YAML workflow definitions
* JSON agent configuration files
* prompt templates embedded in configuration
* tool configuration blocks

Without a structured artifact loading system, the scanner would not be able to reliably locate or analyze these inputs.

A dedicated artifact loader improves:

* scanner reliability
* rule engine simplicity
* extensibility for future artifact types
* support for scanning repositories rather than single files

This component is a foundational piece of the scanning pipeline.

---

## Proposed Approach

Implement an artifact discovery and loading component responsible for:

1. Accepting the scan target path provided by the CLI.
2. Traversing directories recursively when the target is a repository.
3. Identifying candidate files that may contain agent configurations or workflows.
4. Parsing supported file formats.
5. Producing normalized artifact objects for the rule engine.

Initial format support for v0.1:

* YAML
* JSON

Example scan target:

```
detektor scan ./repo
```

Example repository structure:

```
repo/
  agent.yaml
  workflow.yaml
  prompts/
    review_prompt.txt
```

Possible internal artifact model:

```
Artifact
- path
- type
- content
- format
```

The artifact loader will provide a collection of artifacts that the rule engine can analyze for potential security issues.

---

## Alternatives Considered

**1. Scan only specific file types**

Detektor could initially scan only a small set of predefined file types (for example `*.yaml`).

However, this approach may miss relevant artifacts that appear in other formats or naming conventions.

**2. Require explicit configuration of artifacts**

Another option would be requiring users to specify artifact files manually.

This would reduce automatic discovery complexity but would significantly reduce usability for CI workflows.

---

## Risks and Trade-offs

**False discovery of irrelevant files**

Repository scanning may load files that are not relevant to AI agents.

This can be mitigated with simple heuristics or filtering rules.

**Parser complexity**

Supporting multiple file formats introduces parsing complexity.

For v0.1, limiting support to **YAML and JSON** keeps implementation manageable.

**Repository size impact**

Scanning very large repositories could introduce performance considerations.

This can be optimized in later iterations if needed.

---

## Open Questions

* Should artifact discovery rely only on file extensions or also inspect file content?
* Should Detektor support prompt files stored as plain text in v0.1?
* Should artifact loading support ignore rules (for example `.detektorignore`) in future versions?

---

## Examples

Example CLI usage:

```
detektor scan .
```

Expected behavior:

* Detektor discovers relevant configuration files.
* Files are parsed into artifact objects.
* Artifacts are passed to the rule engine for analysis.

---

## Next Steps

If this proposal is accepted:

1. Implement artifact discovery for directories and files.
2. Implement basic YAML and JSON parsers.
3. Create a normalized artifact representation.
4. Integrate artifact loading with the CLI scan command.
5. Pass loaded artifacts to the rule engine in future issues.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artifact loading and discovery system for scan targets #2

Overview

Motivation

Proposed Approach

Alternatives Considered

Risks and Trade-offs

Open Questions

Examples

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Artifact loading and discovery system for scan targets #2

Description

Overview

Motivation

Proposed Approach

Alternatives Considered

Risks and Trade-offs

Open Questions

Examples

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions