Skip to content

feat: add AI module for LLM interaction and a heuristic for checking code–docstring consistency #1121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AmineRaouane
Copy link
Member

Summary

This PR introduces an AI client for interacting with large language models (LLMs) and adds a new heuristic analyzer to detect inconsistencies between code and its docstrings.

Description of changes

  • Added an AIClient class to enable seamless integration with LLMs .
  • Implemented a new heuristic analyzer called MatchingDocstringsAnalyzer that detect inconsistencies between code and its docstrings.
  • Integrated the new heuristic into the heuristics.py registry.
  • Updated detect_malicious_metadata_check.py to include and run this new heuristic.
  • Added unit tests to verify correct detection of missing or mismatched docstrings.

Related issues

None

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.

…code–docstring consistency

Signed-off-by: Amine <amine.raouane@enim.ac.ma>
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 11, 2025
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
@behnazh-w behnazh-w requested a review from art1f1c3R August 15, 2025 01:24

SYSTEM_PROMPT = """
You are a security expert analyzing a PyPI package. Determine if the package description is secure.
you must score between 0 and 100 based on the following criteria:
Copy link
Member

@art1f1c3R art1f1c3R Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me more about the types of responses models give to this prompt? My main questions are how the model assigns a score from 0 to 100. It is asked to look at several aspects (High-level description summary, benefit, how to install, how to use) and their consistency with each other. What happens if one of those sections is missing? Is that "less consistent"? Do any of the section comparisons get weighted more than the other (e.g. "how to use" vs "high-level description summary" has more weight than "how to use" vs "benefit")? If a description uses clear headings, is it going to be scored higher than one that does not, when both include equivalent information? If a package has a pretty bare-bones description with essentially no information on any of these headings, is it labelled inconsistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I didn’t specify more in the system prompt as you can see, the model will decide the overall score. If needed, I can add examples of best and worst packages in the prompt to guide it better.

Right now, the model isn’t explicitly told to weigh one section over another; it just evaluates the consistency across all provided aspects (high-level description, benefit, how to install, how to use) and gives a score from 0 to 100. If one of those sections is missing, the model will likely consider it “less consistent” overall, which should lower the score.

The returned format looks like this:
{ "score": ... , "reason": ... }

pytest.skip("AI client disabled - skipping test")


def test_analyze_consistent_description_pass(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we patch and hardcode the return values for analyzer.client.invoke. Isn't pass/fail mostly determined by that result though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that’s exactly why I set it up this way I wanted something fixed across all models, so patching and hardcoding the return values for analyzer.client.invoke ensures consistency. The pass/fail is indeed mostly determined by that result, and this approach keeps it stable so the outcome isn’t affected by model variance.

mock_invoke.assert_called_once()


def test_analyze_inconsistent_description_fail(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

"The inconsistent code, or null."
}

/no_think
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain to us what does /no_think means in this context and why it's used in this prompt ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/no_think here is used to prevent “thinking” or reasoning steps from the model. This way, it skips generating internal reasoning chains and directly produces the output, which reduces response time .

"""Check whether the docstrings and the code components are consistent."""

SYSTEM_PROMPT = """
You are a code master who can detect the inconsistency of the code with the docstrings that describes its components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any results from running this on existing PyPI packages, and how the LLM performed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants