Skip to content

Commit 27f3cdd

Browse files
authored
chore: reduce FPs in whitespace PR by considering ; statement (#1186)
Reduce false positives in the whitespace semgrep rule by considering the ; statement. Signed-off-by: Carl Flottmann <carl.flottmann@oracle.com>
1 parent 9efe742 commit 27f3cdd

File tree

5 files changed

+69
-26
lines changed

5 files changed

+69
-26
lines changed

src/macaron/config/defaults.ini

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -629,12 +629,13 @@ check_deliverability = True
629629
# custom rulesets: this is a collection of user-provided rulesets, living inside the path provided to 'custom_semgrep_rules_path'.
630630

631631
# disable default semgrep rulesets here (i.e. all rule IDs in a Semgrep .yaml file) using ruleset names, the name
632-
# without the .yaml prefix. Currently, we disable the exfiltration rulesets by default due to a high false positive rate.
633-
# This list may not contain duplicated elements. Macaron's default ruleset names are all unique.
632+
# without the .yaml prefix (e.g. "obfuscation" for "obfuscation.yaml"). Currently, we disable the exfiltration rulesets
633+
# by default due to a high false positive rate. This list may not contain duplicated elements. Macaron's default ruleset
634+
# names are all unique.
634635
disabled_default_rulesets = exfiltration
635-
# disable individual rules here (i.e. individual rule IDs inside a Semgrep .yaml file) using rule IDs. You may also
636-
# provide the IDs of your custom semgrep rules here too, as all Semgrep rule IDs must be unique. This list may not contain
637-
# duplicated elements.
636+
# disable individual rules here (i.e. individual rule IDs inside a Semgrep .yaml file, specified under the "rules" header in the
637+
# .yaml file, with each rule ID under "- id") using rule IDs. You may also provide the IDs of your custom semgrep rules here too,
638+
# as all Semgrep rule IDs must be unique. This list may not contain duplicated elements.
638639
disabled_rules =
639640
# absolute path to a directory where a custom set of semgrep rules for source code analysis are stored. These will be included
640641
# with Macaron's default rules. The path will be normalised to the OS path type.

src/macaron/malware_analyzer/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,15 @@ This feature is currently a work in progress, and supports detection of code obf
101101
- `custom_semgrep_rules`: supply to this an absolute path to a directory containing custom Semgrep `.yaml` files to be run alongside the default ones.
102102
- `disabled_custom_rulesets`: supply to this a comma separated list of the names of custom Semgrep rule files (excluding the `.yaml` extension) to disable all rule IDs in that file.
103103
104+
Here, a "semgrep ruleset" refers to the name of a Semgrep `.yaml` file without the extension. For example, the name of one of the default rulesets is `obfuscation`, as the file name is `obfuscation.yaml`. To disable all rules in that `.yaml` file would look like this:
105+
```
106+
disabled_default_rulesets = obfuscation
107+
```
108+
A "semgrep rule", or "rule ID", refers to an `- id` entry under the `rules:` heading in a Semgrep `.yaml` file. For example, the name of a rule in `obfuscation.yaml` would be `obfuscation_excessive-spacing`, which is the name specified under the `- id` entry for that rule. Disabling it would look like this:
109+
```
110+
disabled_rules = obfuscation_excessive-spacing
111+
```
112+
104113
### Contributing
105114
106115
When contributing an analyzer, it must meet the following requirements:

src/macaron/resources/pypi_malware_rules/obfuscation.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,8 @@ rules:
319319
languages:
320320
- python
321321
severity: ERROR
322-
patterns:
323-
- pattern-regex: '[\s]{50,}(\S)+' # The 50 here is the threshold for excessive spacing , more than that is considered obfuscation
324-
- pattern-not-regex: '"""[\s\S]*"""'
322+
pattern-either: # The 50 here is the threshold for excessive spacing , more than that is considered obfuscation
323+
# there is excessive spacing after a ";", marking the end of a statement, then additional code.
324+
- pattern-regex: ;[\s]{50,}(\S)+
325+
# there is excessive spacing before a ";", and any amount of whitespace before additional code.
326+
- pattern-regex: '[\s]{50,};[\s]*(\S)+'

tests/malware_analyzer/pypi/resources/sourcecode_samples/obfuscation/excessive_spacing.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ def test_function():
2020
"""
2121
sys.exit()
2222

23-
# excessive spacing obfuscation
24-
def excessive_spacing_flow():
25-
print("Hello world!")
23+
# excessive spacing obfuscation. The second line here will trigger two detections, which is expected since it matches both patterns.
24+
print("hello"); __import__('os')
25+
print("hi") ; __import__('base64')
26+
print("things") ;__import__('zlib')

tests/malware_analyzer/pypi/resources/sourcecode_samples/obfuscation/expected_results.json

Lines changed: 45 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,21 @@
5353
"start": 44,
5454
"end": 44
5555
},
56+
{
57+
"file": "obfuscation/excessive_spacing.py",
58+
"start": 24,
59+
"end": 24
60+
},
61+
{
62+
"file": "obfuscation/excessive_spacing.py",
63+
"start": 25,
64+
"end": 25
65+
},
66+
{
67+
"file": "obfuscation/excessive_spacing.py",
68+
"start": 26,
69+
"end": 26
70+
},
5671
{
5772
"file": "obfuscation/inline_imports.py",
5873
"start": 23,
@@ -105,6 +120,36 @@
105120
}
106121
]
107122
},
123+
"src.macaron.resources.pypi_malware_rules.obfuscation_excessive-spacing": {
124+
"message": "Hidden code after excessive spacing",
125+
"detections": [
126+
{
127+
"file": "obfuscation/excessive_spacing.py",
128+
"start": 24,
129+
"end": 24
130+
},
131+
{
132+
"file": "obfuscation/excessive_spacing.py",
133+
"start": 25,
134+
"end": 25
135+
},
136+
{
137+
"file": "obfuscation/excessive_spacing.py",
138+
"start": 25,
139+
"end": 25
140+
},
141+
{
142+
"file": "obfuscation/excessive_spacing.py",
143+
"start": 26,
144+
"end": 26
145+
},
146+
{
147+
"file": "obfuscation/inline_imports.py",
148+
"start": 27,
149+
"end": 27
150+
}
151+
]
152+
},
108153
"src.macaron.resources.pypi_malware_rules.obfuscation_obfuscation-tools": {
109154
"message": "Found an indicator of the use of a python code obfuscation tool",
110155
"detections": [
@@ -229,21 +274,6 @@
229274
"end": 68
230275
}
231276
]
232-
},
233-
"src.macaron.resources.pypi_malware_rules.obfuscation_excessive-spacing": {
234-
"message": "Hidden code after excessive spacing",
235-
"detections": [
236-
{
237-
"file": "obfuscation/excessive_spacing.py",
238-
"start": 24,
239-
"end": 25
240-
},
241-
{
242-
"file": "obfuscation/inline_imports.py",
243-
"start": 27,
244-
"end": 27
245-
}
246-
]
247277
}
248278
},
249279
"disabled_sourcecode_rule_findings": {}

0 commit comments

Comments
 (0)