Skip to content

Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241#1296

Closed
rahulgunwanistudy-2005 wants to merge 17 commits intoGreedyBear-Project:mainfrom
rahulgunwanistudy-2005:feature/bulk-ioc-prefetch-1241
Closed

Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241#1296
rahulgunwanistudy-2005 wants to merge 17 commits intoGreedyBear-Project:mainfrom
rahulgunwanistudy-2005:feature/bulk-ioc-prefetch-1241

Conversation

@rahulgunwanistudy-2005
Copy link
Copy Markdown
Contributor

Description

The extraction pipeline was calling ioc_repo.get_ioc_by_name() once per IOC inside add_ioc(), so a batch of U unique IPs meant U SELECT queries. On top of that, _add_fks() in BaseExtractionStrategy was re-fetching IOCs that add_ioc() had just processed. Tanner had a manual local ioc_cache in _classify_attacks() as a partial workaround for the same problem.

This PR fixes all three by introducing a bulk prefetch mechanism in IocProcessor:

  • Added _ioc_cache and prefetch_iocs() to IocProcessor. Each _get_scanners() now bulk-loads all IOCs for the batch in one query before the loop, reducing U SELECTs to 1.
  • add_ioc() checks the cache first, falls back to the repo only on a miss, and updates the cache after every save.
  • _add_fks() now builds an in-memory map from self.ioc_records instead of making extra DB calls for IOCs already processed.
  • Tanner's manual local ioc_cache in _classify_attacks() is removed and replaced by the processor-level cache.

Related issues

Closes #1241

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Formalities

  • I have read and understood the rules about how to Contribute to this project.
  • I chose an appropriate title for the pull request in the form: Enhancement: bulk prefetch IOCs before processing to eliminate per-IOC SELECT queries. Closes #1241
  • My branch is based on develop.
  • The pull request is for the branch develop.
  • I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in the docs. No user-facing behavior is affected.
  • Linter (Ruff) gave 0 errors.
  • I have added tests for the feature/bug I solved.
  • All the tests gave 0 errors.

GUI changes

No GUI changes in this PR.

regulartim and others added 17 commits April 23, 2026 07:42
…Bear-Project#1284)

* Install gb-ui library

* Replace imports

* Update frontend README
…t#1280 (GreedyBear-Project#1292)

* Add rule and exception for G004

* Reorder ignores

* Fix violations
…1260 - Reduced Tim… (GreedyBear-Project#1289)

* Enhancement: optimization in Cowrie _get_sessions GreedyBear-Project#1260 - Reduced Time Complexity from O(N*M) to O(M)

* chore: simplify src_ip extraction in _get_scanners to fix PR review

* chore: remove redundant src_ip check per reviewer feedback
…r-Project#1223 (GreedyBear-Project#1250)

* Fix statistics source handling for proxy and IPv6

* Add merge migration for conflicting greedybear 0050 leaves

* Make statistics source migration sequential after 0050

* Set 0051 statistics migration dependency to 0050_attackeractivitybucket

* Address moderator feedback: raise exception instead of returning empty string

- Add UnableToExtractSourceIPError custom exception
- Modify get_request_source_ip() to raise exception with logging when no valid IP found
- Update all callers (utils.py, enrichment.py, command_sequence.py, cowrie_session.py) to handle exception
- Update test to expect exception behavior instead of empty string
- Statistics recording is now skipped when source IP cannot be extracted

* Fix formatting: add blank line after docstring

* Fix linter errors: remove unnecessary pass and sort imports

* Fix import order in command_sequence.py to match original
…ct#1278 (GreedyBear-Project#1279)

* add database index for attacker_country_code

* generate migration for attacker_country_code index

* normalize attacker_country_code to uppercase on write

* use exact lookup with upper() for country_code query

* fix migration chain after merging develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimization: Bulk Enrichment of IOCs

8 participants