Skip to content

IOC identity: dedupe migration + uniqueness constraint. Refs #1240#1257

Open
tanmayjoddar wants to merge 1 commit intoGreedyBear-Project:developfrom
tanmayjoddar:fix/ioc-identity-pr1
Open

IOC identity: dedupe migration + uniqueness constraint. Refs #1240#1257
tanmayjoddar wants to merge 1 commit intoGreedyBear-Project:developfrom
tanmayjoddar:fix/ioc-identity-pr1

Conversation

@tanmayjoddar
Copy link
Copy Markdown
Contributor

@tanmayjoddar tanmayjoddar commented Apr 18, 2026

This PR is PR 1/2 from Discussion #1236 and implements the data-layer part of Issue #1240.

It implements the data-layer hardening first:

  1. deduplicate existing IOC rows by identity (name, type)
  2. enforce DB uniqueness for IOC identity
  3. add migration tests for both steps

The write-path hardening (concurrency-safe upsert/transaction handling) is intentionally left for PR 2.

What changed

  • Added IOC identity unique constraint in model metadata:
    • UniqueConstraint(fields=["name", "type"], name="unique_ioc_identity")
  • Added data migration to merge duplicate IOC rows:
    • keeps canonical row per (name, type)
    • merges counters/timestamps/array fields
    • preserves relationships (honeypots, sensors, credentials, related_ioc)
    • remaps dependent records (Tag, CowrieSession)
  • Split constraint into a separate migration step (to avoid Postgres pending-trigger edge cases during tests)
  • Added migration tests:
    • dedupe behavior test (0050 -> 0051)
    • uniqueness enforcement test (0051 -> 0052)

Related issues

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Formalities

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in the docs. If so, I also included an update to the wiki in the description of this PR.
  • Linter (Ruff) gave 0 errors.
  • I have added tests for the feature/bug I solved.
  • All migration tests gave 0 errors (tests.test_migrations in Docker).

GUI changes

Ignore this section if you did not make any changes to the GUI.

  • I have provided a screenshot of the result in the PR.
  • I have created new frontend tests for the new component or updated existing ones.

Notes for reviewers

@tanmayjoddar tanmayjoddar marked this pull request as ready for review April 18, 2026 06:20
Copilot AI review requested due to automatic review settings April 18, 2026 06:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the data layer for IOC identity by deduplicating existing IOC rows on (name, type), then enforcing a DB-level uniqueness constraint to prevent future duplicates, with migration tests covering both steps.

Changes:

  • Added a data migration that merges duplicate IOC rows per (name, type) and remaps dependent relations/records.
  • Added a DB uniqueness constraint for IOC identity (name, type) as a separate migration step.
  • Added migration tests for dedupe behavior and for uniqueness enforcement.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
greedybear/models.py Declares IOC identity uniqueness via a model-level UniqueConstraint(name, type).
greedybear/migrations/0051_ioc_identity_uniqueness_and_dedupe.py Implements the merge/dedupe data migration for pre-existing duplicate IOC identities.
greedybear/migrations/0052_ioc_unique_identity_constraint.py Adds the DB constraint enforcing uniqueness on (name, type).
tests/test_migrations.py Adds migration tests validating dedupe semantics and constraint enforcement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_migrations.py
self.assertEqual(sorted(merged.related_urls), ["http://a.example", "http://b.example"])

self.assertEqual(merged.honeypots.count(), 2)
self.assertEqual(merged.sensors.count(), 2)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates honeypot/sensor and FK remapping, but the migration also claims to preserve the credentials and related_ioc relationships. Adding assertions that those relationships are correctly transferred to the canonical IOC would better lock in the intended behavior and catch regressions in the merge logic.

Suggested change
self.assertEqual(merged.sensors.count(), 2)
self.assertEqual(merged.sensors.count(), 2)
self.assertEqual(merged.credentials.count(), 1)
self.assertEqual(merged.related_ioc.count(), 1)
self.assertTrue(merged.related_ioc.filter(pk=ioc_other_type.pk).exists())

Copilot uses AI. Check for mistakes.
Comment thread greedybear/models.py
Comment on lines 115 to +118
class Meta:
constraints = [
models.UniqueConstraint(fields=["name", "type"], name="unique_ioc_identity"),
]
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model’s identity is now explicitly (name, type), but several code paths still look up IOCs by name only (e.g., IocRepository.get_ioc_by_name() uses IOC.objects.get(name=name)). Since rows with the same name but different type are intentionally allowed (and even exercised in the new migration test), those lookups can raise MultipleObjectsReturned or pick the wrong row. Consider updating the repository/API to accept (name, type) (or otherwise enforce/normalize name uniqueness) so callers align with this constraint.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +69
related_to_duplicate = duplicate.related_ioc.exclude(pk=canonical.pk)
if related_to_duplicate.exists():
canonical.related_ioc.add(*related_to_duplicate)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The related_to_duplicate.exists() check introduces an extra query per duplicate row. Since .add() is a no-op when given no objects, you can avoid the extra DB round-trip by materializing once (or iterating IDs) and calling .add() directly without a separate .exists() call.

Suggested change
related_to_duplicate = duplicate.related_ioc.exclude(pk=canonical.pk)
if related_to_duplicate.exists():
canonical.related_ioc.add(*related_to_duplicate)
related_to_duplicate = list(duplicate.related_ioc.exclude(pk=canonical.pk))
canonical.related_ioc.add(*related_to_duplicate)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tanmayjoddar ! Thanks for your work! :) Migration 51 is much too complex. If anything goes wrong during a data migration, if leaves the application in a inconsistent and maybe non-working state. That's a risk we have to minimize.

Your approach of merging IOCs cleanly the "correct" way of handling this but I would prefer something simpler and low-risk. For example: if there are multiple IOCs with the same name, drop all but the one with the highest attack count.

What do you think?

@tanmayjoddar
Copy link
Copy Markdown
Contributor Author

Thanks @regulartim — that makes sense, I agree on minimizing migration risk.

I’ll simplify migration 51 accordingly:

  • group duplicates by IOC identity (name, type)
  • keep a single canonical row per identity (highest attack_count, tie-break by latest id)
  • delete the remaining duplicates without attempting full merges
  • apply the uniqueness constraint in a follow-up migration

I’ll also simplify the migration tests to reflect this behavior.

If you’d prefer an even more minimal approach, I’m happy to reduce the scope further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants