Description
While reviewing the database schema in greedybear/models.py, I noticed that the IOC model currently lacks a generic way to track the provenance of the intelligence it stores.
Right now, an IOC's origins are mapped using the sensors and general_honeypot ManyToMany fields:
# greedybear/models.py
class IOC(models.Model):
# ...
# FEEDS - list of honeypots from general list, from which the IOC was detected
general_honeypot = models.ManyToManyField(GeneralHoneypot, blank=True)
# SENSORS - list of T-Pot sensors that detected this IOC
sensors = models.ManyToManyField(Sensor, blank=True)
However, because GreedyBear currently pulls almost exclusively from T-Pot Elasticsearch, there is no field to specify the high-level data source (e.g., T-Pot, an external API, a manual submission, etc.).
As we expand GreedyBear to accept data from external applications via the planned Event Collector API, we will need a reliable way to differentiate these sources at the database level.
Impact
Without a generic data source tracking field:
- We cannot easily filter or serve feeds based on specific upstream providers.
- It becomes difficult to audit data quality, reliability, or noise levels per source over time.
- The Event Collector API will struggle to label incoming externally-injected events accurately without polluting existing T-Pot-specific fields like
sensors.
Proposed Solution
Add a data_source field to the IOC model to track where the data originated.
This could be implemented either as a simple CharField with choices, or ideally as a ForeignKey to a new DataSource model to allow dynamic addition of new integrations:
# greedybear/models.py
class DataSource(models.Model):
name = models.CharField(max_length=64, unique=True)
description = models.TextField(blank=True)
active = models.BooleanField(default=True)
def __str__(self):
return self.name
class IOC(models.Model):
# ... existing fields
data_source = models.ForeignKey(
DataSource,
on_delete=models.SET_NULL,
null=True,
blank=True
)
# ...
This would immediately lay the groundwork for the multi-source ingestion required by the Event Collector API.
Happy to work on a PR for this if you think the approach makes sense!
Description
While reviewing the database schema in
greedybear/models.py, I noticed that theIOCmodel currently lacks a generic way to track the provenance of the intelligence it stores.Right now, an IOC's origins are mapped using the
sensorsandgeneral_honeypotManyToMany fields:However, because GreedyBear currently pulls almost exclusively from T-Pot Elasticsearch, there is no field to specify the high-level data source (e.g., T-Pot, an external API, a manual submission, etc.).
As we expand GreedyBear to accept data from external applications via the planned Event Collector API, we will need a reliable way to differentiate these sources at the database level.
Impact
Without a generic data source tracking field:
sensors.Proposed Solution
Add a
data_sourcefield to theIOCmodel to track where the data originated.This could be implemented either as a simple
CharFieldwith choices, or ideally as aForeignKeyto a newDataSourcemodel to allow dynamic addition of new integrations:This would immediately lay the groundwork for the multi-source ingestion required by the Event Collector API.
Happy to work on a PR for this if you think the approach makes sense!