Skip to content

Add data_source field to IOC model for multi-source tracking #994

@manik3160

Description

@manik3160

Description

While reviewing the database schema in greedybear/models.py, I noticed that the IOC model currently lacks a generic way to track the provenance of the intelligence it stores.

Right now, an IOC's origins are mapped using the sensors and general_honeypot ManyToMany fields:

# greedybear/models.py
class IOC(models.Model):
    # ...
    # FEEDS - list of honeypots from general list, from which the IOC was detected
    general_honeypot = models.ManyToManyField(GeneralHoneypot, blank=True)

    # SENSORS - list of T-Pot sensors that detected this IOC
    sensors = models.ManyToManyField(Sensor, blank=True)

However, because GreedyBear currently pulls almost exclusively from T-Pot Elasticsearch, there is no field to specify the high-level data source (e.g., T-Pot, an external API, a manual submission, etc.).

As we expand GreedyBear to accept data from external applications via the planned Event Collector API, we will need a reliable way to differentiate these sources at the database level.

Impact

Without a generic data source tracking field:

  • We cannot easily filter or serve feeds based on specific upstream providers.
  • It becomes difficult to audit data quality, reliability, or noise levels per source over time.
  • The Event Collector API will struggle to label incoming externally-injected events accurately without polluting existing T-Pot-specific fields like sensors.

Proposed Solution

Add a data_source field to the IOC model to track where the data originated.

This could be implemented either as a simple CharField with choices, or ideally as a ForeignKey to a new DataSource model to allow dynamic addition of new integrations:

# greedybear/models.py
class DataSource(models.Model):
    name = models.CharField(max_length=64, unique=True)
    description = models.TextField(blank=True)
    active = models.BooleanField(default=True)

    def __str__(self):
        return self.name

class IOC(models.Model):
    # ... existing fields
    data_source = models.ForeignKey(
        DataSource,
        on_delete=models.SET_NULL,
        null=True,
        blank=True
    )
    # ...

This would immediately lay the groundwork for the multi-source ingestion required by the Event Collector API.

Happy to work on a PR for this if you think the approach makes sense!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpythonPull requests that update Python code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions