Skip to content

Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241#1297

Open
rahulgunwanistudy-2005 wants to merge 2 commits intoGreedyBear-Project:developfrom
rahulgunwanistudy-2005:feature/bulk-ioc-prefetch-1241
Open

Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241#1297
rahulgunwanistudy-2005 wants to merge 2 commits intoGreedyBear-Project:developfrom
rahulgunwanistudy-2005:feature/bulk-ioc-prefetch-1241

Conversation

@rahulgunwanistudy-2005
Copy link
Copy Markdown
Contributor

Description

The extraction pipeline was calling ioc_repo.get_ioc_by_name() once per IOC inside add_ioc(), so a batch of U unique IPs meant U SELECT queries. On top of that, _add_fks() in BaseExtractionStrategy was re-fetching IOCs that add_ioc() had just processed. Tanner had a manual local ioc_cache in _classify_attacks() as a partial workaround for the same problem.

This PR fixes all three by introducing a bulk prefetch mechanism in IocProcessor:

  • Added _ioc_cache and prefetch_iocs() to IocProcessor. Each _get_scanners() now bulk-loads all IOCs for the batch in one query before the loop, reducing U SELECTs to 1.
  • add_ioc() checks the cache first, falls back to the repo only on a miss, and updates the cache after every save.
  • _add_fks() now builds an in-memory map from self.ioc_records instead of making extra DB calls for IOCs already processed.
  • Tanner's manual local ioc_cache in _classify_attacks() is removed and replaced by the processor-level cache.

Related issues

Closes #1241

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Formalities

  • I have read and understood the rules about how to Contribute to this project.
  • I chose an appropriate title for the pull request in the form: Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241
  • My branch is based on develop.
  • The pull request is for the branch develop.
  • I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in the docs. No user-facing behavior is affected.
  • Linter (Ruff) gave 0 errors.
  • I have added tests for the feature/bug I solved.
  • All the tests gave 0 errors.

GUI changes

No GUI changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants