Skip to content

Protential ReDoS vulnerability in docstring.py #13730

@ShangzhiXu

Description

@ShangzhiXu

Describe the bug

Describe the bug
Hi team, thanks for your great work! I think I found a small vulnerability that might lead to DDoS in the system
At line 27 in docstring.py
the regex _google_typed_arg_regex = re.compile(r'(.+?)\(\s*(.*[^\s]+)\s*\)') is vulnerable to ReDoS when it is used in
match = _google_typed_arg_regex.match(before)

How to Reproduce

To Reproduce
I have a test file here and can be run directly

from sphinx.ext.napoleon import Config
from sphinx.ext.napoleon.docstring import GoogleDocstring
import time 

for i in range(500, 2500, 500):
    attack_string = ' ('  + '(' * 2 * i
    google_style_docstring = f"""
brief summary
Args:
    {attack_string}
"""
    start_time = time.time()
    napoleon_config = Config(napoleon_use_param=True, napoleon_use_rtype=True)
    parsed_docstring = GoogleDocstring(google_style_docstring, config=napoleon_config)
    end_time = time.time()
    print(f"i: {i}, Time taken: {end_time - start_time} seconds")

The result is like this

i: 500, Time taken: 4.026143550872803 seconds
i: 1000, Time taken: 32.2109649181366 seconds
i: 1500, Time taken: 109.16966938972473 seconds
i: 2000, Time taken: 258.61019134521484 seconds

As we can see, with around 4k chars, the string can cost the system to hang for around 5 mins. It might lead to some some DDoS or high CPU consumption in real-world scenarios where sphinx is used to parse user input code.

Expected behavior
I think we can add a limit like replace .*? with .{0,200}? ? Maybe it can help to solve the recursion problem.

The core of the problem lies within (.+?)\(\s*(.*[^\s]+). The constructs like (.+?) and .* tend to eagerly match strings, leading to massive recursion and backtracking when faced with malicious input.

I tested a modification of the regex, and the performance improved significantly:

# after modify it to r'(.{0,200}?)\(\s*(.{0,200}[^\s]+)\s*\)'
i: 1000, Time taken: 0.4651494026184082 seconds
i: 2000, Time taken: 0.9634008407592773 seconds
i: 3000, Time taken: 1.4529836177825928 seconds
i: 4000, Time taken: 1.9484753608703613 seconds

Environment Information

lastest version pf sphinx, Linux system, 128 core, 32GB memory

Sphinx extensions

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions