Skip to content

[Discovery] Improve execution time of discovery #71

@felix-20

Description

@felix-20

Problem statement

Especially for large applications, the discovery can take really long when trying to discover multiple patterns.
This is most likely caused by the loading of the CPG. At the moment, the CPG is loaded for every pattern individually.

How is it possible to reduce the execution time for discovery?

There are a few ideas, on how to improve the execution time for discovery:

Single file / or subdirectory discovery

The potentially easiest solution as proposed by @pr0me, is to create multiple CPGs for one project. For each directory in the project or even each file in the project, discovery can be run individually. This might decrease the total execution time, because the CPGs themselves are smaller and the discovery for the subdirectories / files could be run in parallel.

Keep CPG in memory

If the loading of the CPG takes so much time, it might be convenient to keep the CPG in memory in order to execute multiple queries on it.

Interactive joern shell

If you use the interactive shell provided by joern it should be possible to load the CPG once and execute multiple rules on it. This requires a script, that can interact with the joern shell.

One large rule

Another possibility is to merge all discovery rules, that the user wants to execute into one big discovery rule and execute this big discovery rule. The CPG would only be loaded once and all queries could be run on the loaded CPG.
The problems of this solution might be:

  • if one discovery rule is broken, it might break the large discovery rule
  • how do you seperate the different results, and assign each pattern and each instance a result?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions