-
Notifications
You must be signed in to change notification settings - Fork 159
Debugger integration #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: winternship-debugger
Are you sure you want to change the base?
Debugger integration #414
Conversation
* Improve coverage tracking precision Previously coverage tracking included all regions. This changes coverage tracking to be more precise. Macros that result in code are represented as a single line in the function, and any executed line in the macro will make that line 'covered'. Beyound that, only CodeRegions will count. This will prevent code guarded by #if that aren't in the binary from being considered reachable for example. The idea is that seed-gen would have more accurate information when selecting functions to target. I've done some testing and it appears as if the new implementation in general reaches more lines and covers more functions. * Fix macro coverage leaking across files due to missing filename in key The expansion_coverage map used (line, col) as key without filename, causing coverage from one file to incorrectly appear in another when macros were at the same coordinates. * Recursively expand macros and add named types for coverage data - Process ExpansionRegion target_regions recursively instead of counting only the call site line - Add coordinate index for O(1) expansion lookups - Cache computed expansion lines to avoid recomputation across functions - Use bulk set operations for better performance - Prevent infinite loops with visited set for circular macro references - Add named types for coverage data structures: - RegionCoords, ExpansionKey, CachedExpansionLines - Type aliases: ExpansionMap, CoordToFilenames, ExpansionLinesCache - Add comprehensive tests for nested macros and edge cases
…ofbits#408) Co-authored-by: kevin-valerio <kevin-valerio@users.noreply.github.com>
| # Pull Git LFS files if the repository uses LFS | ||
| logger.info("Pulling Git LFS files (if any)") | ||
| result = subprocess.run( | ||
| ["git", "lfs", "pull"], | ||
| cwd=sub_path, | ||
| capture_output=True, | ||
| text=True, | ||
| check=False, # Don't fail if LFS is not used or not installed | ||
| ) | ||
| if result.returncode == 0: | ||
| logger.info(f"Git LFS pull output: {result.stdout}") | ||
| else: | ||
| logger.debug(f"Git LFS pull failed or not needed: {result.stderr}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer need to add lfs pull logic in PR since this was merged: #413
| # Read probability overrides from environment variables | ||
| self.TASK_SEED_INIT_PROB_FULL = float(os.getenv("BUTTERCUP_SEED_INIT_PROB_FULL", self.TASK_SEED_INIT_PROB_FULL)) | ||
| self.TASK_VULN_DISCOVERY_PROB_FULL = float( | ||
| os.getenv("BUTTERCUP_VULN_DISCOVERY_PROB_FULL", self.TASK_VULN_DISCOVERY_PROB_FULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make these new options configurable by the pydantic settings in config.py, please? When you add them to the pydantic settings you'll get built-in support for env variables, too.
hbrodin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! I've done an initial pass on it and left a bunch of comments. Impressive work overall.
| cpu: 2000m | ||
| memory: 8Gi | ||
| requests: | ||
| cpu: 200m | ||
| memory: 256Mi | ||
| cpu: 500m | ||
| memory: 1Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes necessary to run with the debugger integrated? Bumping the requests will require more resources for the system to run and might prevent some systems from running Buttercup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an artifact of debugging the OOM, before the coverage bot was identified as the source of the OOM? Edit: responded under wrong comment
| # Resource limits for LiteLLM container to prevent OOMKilled errors | ||
| resources: | ||
| limits: | ||
| cpu: 2000m | ||
| memory: 4Gi | ||
| requests: | ||
| cpu: 500m | ||
| memory: 1Gi | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the OOMs due to LiteLLM? Or is LiteLLM killed due to high memory pressure situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an artifact of debugging the OOM, before the coverage bot was identified as the source of the OOM?
|
|
||
| RUN DEBIAN_FRONTEND=noninteractive apt-get update && \ | ||
| apt-get install -y git curl && \ | ||
| apt-get install -y git git-lfs curl && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should rebase/merge once #413 is merged into main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these changes are needed as you can just specify this on the command line. Or, am I missing something?
| remaining_time = end_time - time.time() | ||
| if remaining_time <= 0: | ||
| logger.warning("Command timeout after %.1fs: %s", timeout, command) | ||
| lines.append(f"\n***timout waiting for end of output after {timeout} seconds***") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| lines.append(f"\n***timout waiting for end of output after {timeout} seconds***") | |
| lines.append(f"\n***timeout waiting for end of output after {timeout} seconds***") |
| # Log all available builds before testing | ||
| logger.info(f"Testing PoV '{pov.name}' against {len(self.build_outputs)} builds for harness '{harness_name}'") | ||
| for i, build in enumerate(self.build_outputs): | ||
| logger.info( | ||
| f""" Build {i}: sanitizer={build.sanitizer}, engine={build.engine}, | ||
| type={BuildType.Name(build.build_type)}, task_id={build.task_id}""" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? You are logging them again in the loop below I think.
| return debug_binary_path is not None and debug_binary_path.exists() | ||
|
|
||
|
|
||
| def resolve_actual_binary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider breaking into smaller functions
| return "\n\n".join(str(attempt) for attempt in self.debug_attempts) | ||
|
|
||
|
|
||
| class DebugSubagentUnified: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very large class with some functions being very long. Is there maybe a way to break it down into smaller classes? What is the gain of having it like this over splitting it? Ideally you'd be able to compose the hybrid from the batch + interactive ones if splitting like that.
| return | ||
|
|
||
| build_dir = Path(builds[BuildType.FUZZER][0].task_dir) | ||
| logger.info(f"Build directory: {build_dir}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug?
| function_name = call.arguments["function_name"] | ||
| result = Task._get_function_definition(function_name, state, tool_call_id) | ||
| results.append(result) | ||
| if isinstance(function_name, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is not a string, what happens then?
|
|
||
| def interrupt(self) -> list[str]: | ||
| """Interrupt the docker process. Overwrite this with program logic, | ||
| return any lines you need to explain what the interuption did.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return any lines you need to explain what the interuption did.""" | |
| return any lines you need to explain what the interruption did.""" |
| return Command(update={"debug_script": debug_script}) | ||
| except Exception as e: | ||
| logger.error(f"Failed to extract debug script from LLM response: {e}") | ||
| if "llm_response" in locals(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking locals, initialize llm_response to None and check if not None
| return [BuildType.FUZZER] | ||
| return [BuildType.FUZZER, BuildType.FUZZER_DEBUG] | ||
|
|
||
| def _is_harness_whitelisted(self, harness_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _is_harness_whitelisted(self, harness_name: str) -> bool: | |
| def _is_harness_allowlisted(self, harness_name: str) -> bool: |
Use allowlist instead
|
|
||
| coverage-bot: | ||
| enabled: true | ||
| enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't disable the coverage bot by default
| seedExploreProbFull: .2 | ||
| seedExploreProbDelta: .2 | ||
| minSeedInitRuns: 0 | ||
| minVulnDiscoveryRuns: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaults in K8s should preserve the defaults in main
| seedExploreProbDelta: .2 | ||
| minSeedInitRuns: 0 | ||
| minVulnDiscoveryRuns: 0 | ||
| # Harness whitelist - comma-separated list of harness names/substrings to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use allowlist instead
| f""" Result: did_run={result.did_run()}, did_crash={result.did_crash()}, | ||
| returncode={result.command_result.returncode if result.command_result else "N/A"}""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f""" Result: did_run={result.did_run()}, did_crash={result.did_crash()}, | |
| returncode={result.command_result.returncode if result.command_result else "N/A"}""" | |
| f"""Result: did_run={result.did_run()}, did_crash={result.did_crash()}, """ | |
| f"""returncode={result.command_result.returncode if result.command_result else "N/A"}""" |
Make log one line instead of multiline with indentation
| f""" Build {i}: sanitizer={build.sanitizer}, engine={build.engine}, | ||
| type={BuildType.Name(build.build_type)}, task_id={build.task_id}""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f""" Build {i}: sanitizer={build.sanitizer}, engine={build.engine}, | |
| type={BuildType.Name(build.build_type)}, task_id={build.task_id}""" | |
| f"""Build {i}: sanitizer={build.sanitizer}, engine={build.engine}, """ | |
| f"""type={BuildType.Name(build.build_type)}, task_id={build.task_id}""" |
Make log one line without indentation
| for src, dst in mount_dirs.items(): | ||
| docker_cmd += ["-v", f"{src.resolve().as_posix()}:{dst.as_posix()}"] | ||
| logger.debug("Mounting %s -> %s", src, dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make these volumes read-only? (At least the ones we can, I think scratchpad needs to be read/write).
| "Initializing DockerInteractive session: container=%s, timeout=%.1fs", container_image, global_timeout | ||
| ) | ||
|
|
||
| docker_cmd = ["docker", "run", "--privileged", "--shm-size=2g", "-i", "--name", self.container_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it pass --rm to remove the container after it stops?
| "Initializing DockerInteractive session: container=%s, timeout=%.1fs", container_image, global_timeout | ||
| ) | ||
|
|
||
| docker_cmd = ["docker", "run", "--privileged", "--shm-size=2g", "-i", "--name", self.container_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add --network none to isolate from network?
| for line in next_command.split("\n") | ||
| if line.strip() and not line.strip().startswith("#") | ||
| ] | ||
| logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a TODO to sanitize higher risk gdb commands (e.g. shell, python). The main focus is isolating the container since the LLM can run gdb but gdb command sanitization might be useful as defense in depth.
| **Language Agnostic**: This method works for any language. For C++ projects, | ||
| it sets CFLAGS/CXXFLAGS to include full debug symbols. For Java or other languages, | ||
| setting CFLAGS/CXXFLAGS has no effect (which is fine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit unclear because it says language agnostic, but really the method is just for C/C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the comment to make it clear that it will still build for java projects, and we don't have to know the language when we do this process, but it will only actually add debug symbols for C/C++ projects.
| # Verify all source files exist before mounting | ||
| logger.debug("Verifying source files before Docker mount:") | ||
| logger.debug(" Debug script:") | ||
| logger.debug(f" Path: {debug_script_path}") | ||
| logger.debug(f" Exists: {debug_script_path.exists()}") | ||
| logger.debug(f" Is file: {debug_script_path.is_file()}") | ||
| logger.debug(f" Is directory: {debug_script_path.is_dir()}") | ||
| if debug_script_path.exists(): | ||
| logger.debug(f" Size: {debug_script_path.stat().st_size} bytes") | ||
| logger.debug(f" Absolute: {debug_script_path.resolve()}") | ||
|
|
||
| logger.info(" PoV input:") | ||
| logger.debug(f" Path: {pov_input_path}") | ||
| logger.debug(f" Exists: {pov_input_path.exists()}") | ||
| logger.debug(f" Is file: {pov_input_path.is_file()}") | ||
| logger.debug(f" Is directory: {pov_input_path.is_dir()}") | ||
| if pov_input_path.exists(): | ||
| logger.debug(f" Size: {pov_input_path.stat().st_size} bytes") | ||
| logger.debug(f" Absolute: {pov_input_path.resolve()}") | ||
|
|
||
| logger.debug(" Build dir:") | ||
| logger.debug(f" Path: {build_dir}") | ||
| logger.debug(f" Exists: {build_dir.exists()}") | ||
| logger.debug(f" Is directory: {build_dir.is_dir()}") | ||
| logger.debug(f" Absolute: {build_dir.resolve()}") | ||
|
|
||
| logger.info(" Out dir (parent of build_dir):") | ||
| logger.debug(f" Path: {out_dir}") | ||
| logger.debug(f" Exists: {out_dir.exists()}") | ||
| logger.debug(f" Is directory: {out_dir.is_dir()}") | ||
| logger.debug(f" Absolute: {out_dir.resolve()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logging might be too verbose
reytchison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confirming I did my first pass of the PR, too.
This is a pull request to integrate a debugger into the seed-gen component of buttercup. It works by building on the existing vuln-discovery component of seed-gen, which uses an llm to analyze the codebase and generate PoV inputs. When those PoV inputs fail, this program spawns a sub-task with the ability to use debug scripts and an interactive debug session to determine why the PoV inputs failed to trigger the hypothesized vulnerability. This is then fed back to the calling task to inform the next round of PoV inputs.