Skip to content

Commit d1f807d

Browse files
authored
chore: add tutorial for python gen-build-spec and support more build tools (#1236)
This PR adjusts the gen-build-spec tutorial for the Python and extends support for additional Python build tools, and refactors registries to make them build tool agnostic. It also improves the logic for the default build requirements and backends. Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
1 parent 76ec530 commit d1f807d

File tree

56 files changed

+2014
-315
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+2014
-315
lines changed

docs/source/pages/cli_usage/command_gen_build_spec.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,5 +39,4 @@ Options
3939

4040
.. option:: --output-format OUTPUT_FORMAT
4141

42-
The desired output format for the build specification. The default format is `rc-buildspec`, which is the Reproducible-Central build specification.
43-
Other formats may be available depending on your configuration.
42+
The output format. Can be `default-buildspec` (default) or `rc-buildspec` (Reproducible-central build spec)

docs/source/pages/output_files.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Top level structure
2121
2222
output/
2323
├── build_log/
24+
├── buildspec/
2425
├── git_repos/
2526
├── reports/
2627
├── debug.log
@@ -43,8 +44,8 @@ The report files of Macaron (from using the :ref:`analyze command <analyze-comma
4344
Unique result path
4445
''''''''''''''''''
4546

46-
For each target software component, Macaron creates a directory under ``reports`` to store the report files. This directory
47-
path is formed from the PURL string of that component. The final path is created using the following template:
47+
For each target software component, Macaron creates a directory under ``reports`` to store the report. These directory
48+
paths are formed from the PURL string of that component. The final path is created using the following template:
4849

4950
.. code-block::
5051
@@ -131,6 +132,25 @@ to the directory:
131132
132133
.. note:: Please see :ref:`pages/using:analyzing a repository on the local file system` to know how to set the directory for analyzing local repositories.
133134

135+
.. _output_files_macaron_build_spec-Gen:
136+
137+
--------------------------------------
138+
Output files of macaron gen-build-spec
139+
--------------------------------------
140+
141+
As part of the ``gen-build-spec`` command, Macaron generates build spec files to help rebuilding artifacts from source. For each target software component, Macaron creates a dedicated directory under ``buildspec`` to store the generated build specification file. These directory paths are derived from the component's PURL (Package URL) string. The resulting path structure follows this template:
142+
143+
.. code-block::
144+
145+
<path_to_output>/buildspec/<purl_type>/<purl_namespace>/<purl_name>
146+
147+
Depending on the chosen output format, the following files may be generated in each directory:
148+
- ``macaron.buildspec`` (default format)
149+
- ``reproducible_central.buildspec`` (when run with the ``rc-buildspec`` output format for Maven artifacts)
150+
151+
Each file contains the build specification for the corresponding software component.
152+
153+
134154
.. _output_files_macaron_verify_policy:
135155

136156
-------------------------------------

docs/source/pages/supported_technologies/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ such as GitHub Actions workflows.
2929
Build Specification Generation
3030
------------------------------
3131

32-
* Maven and Gradle builds for Java artifacts
32+
* Maven and Gradle builds for Java packages
33+
* The built-in ``build`` module and various build tools, like Poetry for Python packages
3334

3435
.. _supported_git_services:
3536

docs/source/pages/tutorials/rebuild_third_party_artifacts.rst

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ These buildspecs help document and automate the build process for packages, enab
1616

1717
* - Currently Supported packages
1818
* - Maven packages built with Gradle or Maven
19+
* - Python packages built with the built-in ``build`` module and various build tools, like Poetry
1920

2021
.. contents:: :local:
2122

@@ -31,9 +32,9 @@ Addressing this lack of transparency is critical for improving supply chain secu
3132
Background
3233
**********
3334

34-
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.
35+
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., Python or JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.
3536

36-
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_guide>`).
37+
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_macaron_build_spec-Gen>`).
3738

3839
******************************
3940
Installation and Prerequisites
@@ -101,7 +102,48 @@ In the example above, the buildspec is located at:
101102
Step 3: Review and Use the Buildspec File
102103
*****************************************
103104

104-
The generated buildspec uses the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format, for example:
105+
By default we generate the buildspec in JSON format as follows:
106+
107+
.. code-block:: ini
108+
109+
{
110+
"macaron_version": "0.18.0",
111+
"group_id": "org.apache.hugegraph",
112+
"artifact_id": "computer-k8s",
113+
"version": "1.0.0",
114+
"git_repo": "https://github.com/apache/hugegraph-computer",
115+
"git_tag": "d2b95262091d6572cc12dcda57d89f9cd44ac88b",
116+
"newline": "lf",
117+
"language_version": [
118+
"11"
119+
],
120+
"ecosystem": "maven",
121+
"purl": "pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0",
122+
"language": "java",
123+
"build_tools": [
124+
"maven"
125+
],
126+
"build_commands": [
127+
[
128+
"mvn",
129+
"-DskipTests=true",
130+
"-Dmaven.test.skip=true",
131+
"-Dmaven.site.skip=true",
132+
"-Drat.skip=true",
133+
"-Dmaven.javadoc.skip=true",
134+
"clean",
135+
"package"
136+
]
137+
]
138+
}
139+
140+
If you use the ``rc-buildspec`` output format, the generated buildspec follows the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format. For example, you can generate it with:
141+
142+
.. code-block:: shell
143+
144+
./run_macaron.sh gen-build-spec -purl pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0 --database output/macaron.db --output-format rc-buildspec
145+
146+
The resulting file will be saved as ``output/buildspec/maven/org_apache_hugegraph/computer-k8s/reproducible_central.buildspec``, and will look like this:
105147

106148
.. code-block:: ini
107149
@@ -136,18 +178,18 @@ The ``gen-build-spec`` works as follows:
136178

137179
- Extracts metadata and build information from Macaron’s local SQLite database.
138180
- Parses and modifies build commands from CI/CD configurations to ensure compatibility with rebuild systems.
139-
- Identifies the JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
181+
- Identifies the language version, e.g., JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
140182
- Ensures that only the major JDK version is included, as required by the build specification format.
141183

142184

143-
This feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.
185+
The Java support for this feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.
144186

145187
***********************************
146188
Frequently Asked Questions (FAQs)
147189
***********************************
148190

149191
*Q: What formats are supported for buildspec output?*
150-
A: Currently, only ``rc-buildspec`` is supported.
192+
A: Currently, a default JSON spec and optional ``rc-buildspec`` are supported.
151193

152194
*Q: Do I need to analyze the package every time before generating a buildspec?*
153195
A: No, you only need to analyze the package once unless you want to update the database with newer information.

src/macaron/build_spec_generator/common_spec/base_spec.py

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ class BaseBuildSpecDict(TypedDict, total=False):
2525
#: The programming language, e.g., 'java', 'python', 'javascript'.
2626
language: Required[str]
2727

28-
#: The build tool or package manager, e.g., 'maven', 'gradle', 'pip', 'poetry', 'npm', 'yarn'.
29-
build_tool: Required[str]
28+
#: The build tools or package managers, e.g., 'maven', 'gradle', 'pip', 'poetry', 'npm', 'yarn'.
29+
build_tools: Required[list[str]]
3030

3131
#: The version of Macaron used for generating the spec.
3232
macaron_version: Required[str]
@@ -73,10 +73,13 @@ class BaseBuildSpecDict(TypedDict, total=False):
7373
#: Entry point script, class, or binary for running the project.
7474
entry_point: NotRequired[str | None]
7575

76+
#: The build_requires is the required packages that need to be available in the build environment.
77+
build_requires: NotRequired[dict[str, str]]
78+
7679
#: A "back end" is tool that a "front end" (such as pip/build) would call to
7780
#: package the source distribution into the wheel format. build_backends would
7881
#: be a list of these that were used in building the wheel alongside their version.
79-
build_backends: NotRequired[dict[str, str]]
82+
build_backends: NotRequired[list[str]]
8083

8184

8285
class BaseBuildSpec(ABC):
@@ -94,21 +97,21 @@ def resolve_fields(self, purl: PackageURL) -> None:
9497
"""
9598

9699
@abstractmethod
97-
def get_default_build_command(
100+
def get_default_build_commands(
98101
self,
99-
build_tool_name: str,
100-
) -> list[str]:
101-
"""Return a default build command for the build tool.
102+
build_tool_names: list[str],
103+
) -> list[list[str]]:
104+
"""Return the default build commands for the build tools.
102105
103106
Parameters
104107
----------
105-
build_tool_name: str
106-
The build tool to get the default build command.
108+
build_tool_names: list[str]
109+
The build tools to get the default build command.
107110
108111
Returns
109112
-------
110-
list[str]
111-
The build command as a list[str].
113+
list[list[str]]
114+
The build command as a list[list[str]].
112115
113116
Raises
114117
------

src/macaron/build_spec_generator/common_spec/core.py

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ class MacaronBuildToolName(str, Enum):
5757
GRADLE = "gradle"
5858
PIP = "pip"
5959
POETRY = "poetry"
60+
FLIT = "flit"
61+
HATCH = "hatch"
62+
CONDA = "conda"
6063

6164

6265
def format_build_command_info(build_command_info: list[GenericBuildCommandInfo]) -> str:
@@ -117,18 +120,14 @@ def compose_shell_commands(cmds_sequence: list[list[str]]) -> str:
117120
return result
118121

119122

120-
def get_macaron_build_tool_name(
123+
def get_macaron_build_tool_names(
121124
build_tool_facts: Sequence[BuildToolFacts], target_language: str
122-
) -> MacaronBuildToolName | None:
125+
) -> list[MacaronBuildToolName] | None:
123126
"""
124-
Retrieve the Macaron build tool name for supported projects from the database facts.
127+
Retrieve the Macaron build tool names for supported projects from the database facts.
125128
126-
Iterates over the provided build tool facts and returns the first valid `MacaronBuildToolName`
127-
for a supported language. If no valid build tool name is found, returns None.
128-
129-
.. note::
130-
If multiple build tools are present in the database, only the first valid one encountered
131-
in the sequence is returned.
129+
Iterates over the provided build tool facts and returns the list of valid `MacaronBuildToolName`
130+
for a supported language.
132131
133132
Parameters
134133
----------
@@ -139,31 +138,27 @@ def get_macaron_build_tool_name(
139138
140139
Returns
141140
-------
142-
MacaronBuildToolName or None
143-
The corresponding Macaron build tool name if found, otherwise None.
141+
list[MacaronBuildToolName] None
142+
The corresponding Macaron build tool names, or None otherwise.
144143
"""
144+
build_tool_names = []
145145
for fact in build_tool_facts:
146146
if fact.language.lower() == target_language:
147147
try:
148-
macaron_build_tool_name = MacaronBuildToolName(fact.build_tool_name)
148+
build_tool_names.append(MacaronBuildToolName(fact.build_tool_name))
149149
except ValueError:
150150
continue
151151

152-
# TODO: What happen if we report multiple build tools in the database?
153-
return macaron_build_tool_name
154-
155-
return None
152+
return build_tool_names or None
156153

157154

158-
def get_build_tool_name(
155+
def get_build_tool_names(
159156
component_id: int, session: sqlalchemy.orm.Session, target_language: str
160-
) -> MacaronBuildToolName | None:
161-
"""
162-
Retrieve the Macaron build tool name for a given component.
157+
) -> list[MacaronBuildToolName] | None:
158+
"""Retrieve the Macaron build tool names for a given component.
163159
164-
Queries the database for build tool facts associated with the specified component ID
165-
and returns the corresponding `MacaronBuildToolName` if found. If no valid build tool
166-
information is available or an error occurs during the query, returns None.
160+
Queries the database for build tool facts associated with the specified component ID.
161+
It returns the corresponding list of `MacaronBuildToolName` if found.
167162
168163
Parameters
169164
----------
@@ -176,7 +171,7 @@ def get_build_tool_name(
176171
177172
Returns
178173
-------
179-
MacaronBuildToolName or None
174+
list[MacaronBuildToolName] | None
180175
The corresponding build tool name for the component if available, otherwise None.
181176
"""
182177
try:
@@ -203,7 +198,7 @@ def get_build_tool_name(
203198
[(fact.build_tool_name, fact.language) for fact in build_tool_facts],
204199
)
205200

206-
return get_macaron_build_tool_name(build_tool_facts, target_language)
201+
return get_macaron_build_tool_names(build_tool_facts, target_language)
207202

208203

209204
def get_build_command_info(
@@ -345,12 +340,17 @@ def gen_generic_build_spec(
345340
latest_component_repository.commit_sha,
346341
)
347342

348-
build_tool_name = get_build_tool_name(
343+
build_tool_names = []
344+
build_tools = get_build_tool_names(
349345
component_id=latest_component.id, session=session, target_language=target_language
350346
)
351-
if not build_tool_name:
347+
if not build_tools:
352348
raise GenerateBuildSpecError(f"Failed to determine build tool for {purl}.")
353349

350+
# This check is for Pylint, which is not able to iterate over build_tools, even though it cannot be None.
351+
if build_tools is not None:
352+
build_tool_names = [build_tool.value for build_tool in build_tools]
353+
354354
build_command_info = get_build_command_info(
355355
component_id=latest_component.id,
356356
session=session,
@@ -377,7 +377,7 @@ def gen_generic_build_spec(
377377
"ecosystem": purl.type,
378378
"purl": str(purl),
379379
"language": target_language,
380-
"build_tool": build_tool_name.value,
380+
"build_tools": build_tool_names,
381381
"build_commands": [selected_build_command],
382382
}
383383
)

0 commit comments

Comments
 (0)