Skip to content

Bugfix/incorrect evaluation of deferred regular expressions#30

Closed
suryamajhi wants to merge 7 commits intologpoint:mainfrom
suryamajhi:bugfix/incorrect-evaluation-of-deferred-regular-expressions
Closed

Bugfix/incorrect evaluation of deferred regular expressions#30
suryamajhi wants to merge 7 commits intologpoint:mainfrom
suryamajhi:bugfix/incorrect-evaluation-of-deferred-regular-expressions

Conversation

@suryamajhi
Copy link
Copy Markdown
Contributor

Summary

This PR fixes the incorrect evalution of deferred regular expression.

Problem Statement

In presence of any regex modifier, it always does AND operation among them.
Example sigma:

detection:
    sel:
        - fieldA|re: foo.*bar
        - fieldB|re: foo[1-9]bar
        - fieldC: xyz
    condition: sel

Current output:

fieldC="xyz" | process regex("foo.*bar", fieldA, "filter=true") | process regex("foo[1-9]bar", fieldB, "filter=true")

The result of this query is AND operation as both of them are filtering on truth case. But actually, it has to be an OR operation. Also, there is no regards for NOT operation.

Solution

Instead of directly adding the filter=true in regex process command, let the regex process command execute with group capturing. If the regex match was success, then the captured group field is added in the log. Later use that new field to correctly substitute in the sigma expression.

Sigma Example 1:

detection:
    sel:
        - fieldA|re: foo.*bar
        - fieldB|re: foo[1-9]bar
        - fieldC: xyz
    condition: sel

Output:

(fieldC="xyz") OR fieldA=* OR fieldB=*
| process regex("(?P<re1>foo.*bar)", fieldA) 
| process regex("(?P<re2>foo[1-9]bar)", fieldB)
| search re1=* OR re2=* OR fieldC="xyz"

This output correctly express sigma rule as re1 and re2 will be populated accordingly.

Sigma Example 2 (Negation on regex modifier):

detection:
      sel:
          fieldA|re: 127\.0\.0\.1:[1-9]\d{3}
          fieldB: foo
      filter:
          fieldC|re: foo.*bar
      condition: sel and not filter

Output:

(fieldB="foo") OR fieldA=* OR fieldC=* 
| process regex("(?P<re1>127\.0\.0\.1:[1-9]\d{3})", fieldA) 
| process regex("(?P<re2>foo.*bar)", fieldC) 
| search re1="*" fieldB="foo" -re2="*"

…. Need to see if automatic conversion to snake_case makes sense for azure and m365. In future, if there happens to be it's need we shall add them. Also this section is for unusal hack for json_values.
@suryamajhi suryamajhi self-assigned this Feb 21, 2026
@suryamajhi suryamajhi added the bug Something isn't working label Feb 21, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the incorrect evaluation of deferred regular expressions in the Logpoint backend, ensuring proper handling of OR/AND/NOT operations when multiple regex modifiers are present in Sigma rules.

Changes:

  • Modified regex processing to use group capturing instead of direct filtering, generating intermediate fields (re1, re2, etc.) that are subsequently used in search expressions
  • Updated the query generation logic to correctly handle OR, AND, and NOT operations with regex patterns
  • Reverted field mapping functions in M365 and Azure pipelines to simpler implementations

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sigma/backends/logpoint/logpoint.py Core logic changes to implement group-capturing regex with proper substitution and query finalization
tests/test_backend_logpoint.py Added comprehensive test cases for single, OR, AND, and NOT regex operations
sigma/pipelines/logpoint/m365.py Simplified field mapping by removing snake_case conversion logic
sigma/pipelines/logpoint/azure.py Simplified field mapping by removing snake_case conversion logic
pyproject.toml Updated pysigma dependency version requirement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for dt_item in detection_item.detection_items:
self._recursively_substitute_sigma_items(dt_item, orig_field, new_field)
else:
assert "Should not reach here"
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert statement with a string message is not a proper error handling mechanism. Replace with raise SigmaError('Unexpected detection item type') to provide a meaningful error when an unexpected type is encountered.

Suggested change
assert "Should not reach here"
raise SigmaError("Unexpected detection item type")

Copilot uses AI. Check for mistakes.
detection_item.field = new_field
# Overriding the regex modifier
detection_item.modifiers = []
# On successful regex match, a new field is add with any value. So re1=* would suffice and filter the results.
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected grammar from 'a new field is add' to 'a new field is added'.

Suggested change
# On successful regex match, a new field is add with any value. So re1=* would suffice and filter the results.
# On successful regex match, a new field is added with any value. So re1=* would suffice and filter the results.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@swachchhanda000 swachchhanda000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW @suryamajhi,
what happens when '|re' modifier has value as a list.

fieldA|re:
    - 'regex1'
    - 'regex2'

@suryamajhi
Copy link
Copy Markdown
Contributor Author

@swachchhanda000 ,
I didn't have that case in mind while implementing this.

Currently it does foloowing:

fieldA=* OR fieldA=* 
| process regex("(?P<re1> regex1)", fieldA) 
| process regex("(?P<re2> regex2)", fieldA) 
| search re1="*"

This is wrong as re2 is ignored here.
Thank you for bringing this. I will see what I can do.

@suryamajhi
Copy link
Copy Markdown
Contributor Author

Hi @swachchhanda000 ,
The implementation has been updated. Following is the new way of evaluating regex sigma rules.

Major changes:

  1. Pre-filtering remove (because the expression demands to be executed in all queries)
  2. Use of eval for checking if the regex match is successful or not
  3. Search preserves all the field and values.

Sigma:

title: Test
status: test
logsource:
    category: test_category
    product: test_product
detection:
    sel:
        fieldA|re: 
            - foo.*bar
            - abc.*xyz
    condition: sel

LP Query:

| process regex("(?P<fieldA_match>foo.*bar)", fieldA)
| process eval("fieldA_condition=case(isnotnull(fieldA_match) -> 'true', 'false')") 
| process regex("(?P<fieldA_match2>abc.*xyz)", fieldA)
| process eval("fieldA_condition2=case(isnotnull(fieldA_match2) -> 'true', 'false')")
| search fieldA_condition="true" OR fieldA_condition2="true"

Other test cases related to regex are updated in test file as well. See other examples if needed.

@suryamajhi
Copy link
Copy Markdown
Contributor Author

I will close this PR and create new one with updated implementation changes.

@suryamajhi suryamajhi closed this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants