IOC identity: dedupe migration + uniqueness constraint. Refs #1240#1257
IOC identity: dedupe migration + uniqueness constraint. Refs #1240#1257tanmayjoddar wants to merge 1 commit intoGreedyBear-Project:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the data layer for IOC identity by deduplicating existing IOC rows on (name, type), then enforcing a DB-level uniqueness constraint to prevent future duplicates, with migration tests covering both steps.
Changes:
- Added a data migration that merges duplicate IOC rows per
(name, type)and remaps dependent relations/records. - Added a DB uniqueness constraint for IOC identity
(name, type)as a separate migration step. - Added migration tests for dedupe behavior and for uniqueness enforcement.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
greedybear/models.py |
Declares IOC identity uniqueness via a model-level UniqueConstraint(name, type). |
greedybear/migrations/0051_ioc_identity_uniqueness_and_dedupe.py |
Implements the merge/dedupe data migration for pre-existing duplicate IOC identities. |
greedybear/migrations/0052_ioc_unique_identity_constraint.py |
Adds the DB constraint enforcing uniqueness on (name, type). |
tests/test_migrations.py |
Adds migration tests validating dedupe semantics and constraint enforcement. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.assertEqual(sorted(merged.related_urls), ["http://a.example", "http://b.example"]) | ||
|
|
||
| self.assertEqual(merged.honeypots.count(), 2) | ||
| self.assertEqual(merged.sensors.count(), 2) |
There was a problem hiding this comment.
This test validates honeypot/sensor and FK remapping, but the migration also claims to preserve the credentials and related_ioc relationships. Adding assertions that those relationships are correctly transferred to the canonical IOC would better lock in the intended behavior and catch regressions in the merge logic.
| self.assertEqual(merged.sensors.count(), 2) | |
| self.assertEqual(merged.sensors.count(), 2) | |
| self.assertEqual(merged.credentials.count(), 1) | |
| self.assertEqual(merged.related_ioc.count(), 1) | |
| self.assertTrue(merged.related_ioc.filter(pk=ioc_other_type.pk).exists()) |
| class Meta: | ||
| constraints = [ | ||
| models.UniqueConstraint(fields=["name", "type"], name="unique_ioc_identity"), | ||
| ] |
There was a problem hiding this comment.
The model’s identity is now explicitly (name, type), but several code paths still look up IOCs by name only (e.g., IocRepository.get_ioc_by_name() uses IOC.objects.get(name=name)). Since rows with the same name but different type are intentionally allowed (and even exercised in the new migration test), those lookups can raise MultipleObjectsReturned or pick the wrong row. Consider updating the repository/API to accept (name, type) (or otherwise enforce/normalize name uniqueness) so callers align with this constraint.
| related_to_duplicate = duplicate.related_ioc.exclude(pk=canonical.pk) | ||
| if related_to_duplicate.exists(): | ||
| canonical.related_ioc.add(*related_to_duplicate) |
There was a problem hiding this comment.
The related_to_duplicate.exists() check introduces an extra query per duplicate row. Since .add() is a no-op when given no objects, you can avoid the extra DB round-trip by materializing once (or iterating IDs) and calling .add() directly without a separate .exists() call.
| related_to_duplicate = duplicate.related_ioc.exclude(pk=canonical.pk) | |
| if related_to_duplicate.exists(): | |
| canonical.related_ioc.add(*related_to_duplicate) | |
| related_to_duplicate = list(duplicate.related_ioc.exclude(pk=canonical.pk)) | |
| canonical.related_ioc.add(*related_to_duplicate) |
regulartim
left a comment
There was a problem hiding this comment.
Hey @tanmayjoddar ! Thanks for your work! :) Migration 51 is much too complex. If anything goes wrong during a data migration, if leaves the application in a inconsistent and maybe non-working state. That's a risk we have to minimize.
Your approach of merging IOCs cleanly the "correct" way of handling this but I would prefer something simpler and low-risk. For example: if there are multiple IOCs with the same name, drop all but the one with the highest attack count.
What do you think?
|
Thanks @regulartim — that makes sense, I agree on minimizing migration risk. I’ll simplify migration 51 accordingly:
I’ll also simplify the migration tests to reflect this behavior. If you’d prefer an even more minimal approach, I’m happy to reduce the scope further. |
This PR is PR 1/2 from Discussion #1236 and implements the data-layer part of Issue #1240.
It implements the data-layer hardening first:
(name, type)The write-path hardening (concurrency-safe upsert/transaction handling) is intentionally left for PR 2.
What changed
UniqueConstraint(fields=["name", "type"], name="unique_ioc_identity")(name, type)honeypots,sensors,credentials,related_ioc)Tag,CowrieSession)0050 -> 0051)0051 -> 0052)Related issues
Type of change
Checklist
Formalities
<feature name>. Refs #1240(PR1 references Pre-hardening IOC identity writes before Event Injection (follow-up to Discussion #1236) #1240; PR2 is expected to close it)develop.develop.Docs and tests
Ruff) gave 0 errors.tests.test_migrationsin Docker).GUI changes
Ignore this section if you did not make any changes to the GUI.
Notes for reviewers