Skip to content

GitHub Actions build refactor#162

Closed
mtsokol wants to merge 7 commits intogoogle:mainfrom
mtsokol:build-refactor
Closed

GitHub Actions build refactor#162
mtsokol wants to merge 7 commits intogoogle:mainfrom
mtsokol:build-refactor

Conversation

@mtsokol
Copy link
Contributor

@mtsokol mtsokol commented Jul 8, 2025

Hi @iindyk,

This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

@mtsokol mtsokol mentioned this pull request Jul 8, 2025
Co-authored-by: Ihor Indyk <iindyk@google.com>
oss/build_whl.sh Outdated
# Install ArrayRecord from the wheel and run smoke tests.
# TF is not available on Python 3.13 and above.
if (( "${PYTHON_MINOR_VERSION}" < 13 )); then
$PYTHON_BIN -m pip install --find-links=/tmp/grain/all_dist --pre array-record
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp/grain/all_dist -> /tmp/array_record/all_dist

I think in the workflow run you shared it still installed AR from pypi: https://github.com/mtsokol/array_record/actions/runs/16147213833/job/45569132362#step:7:7181

Copy link
Contributor Author

@mtsokol mtsokol Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, it took a bit more work but all linux wheels are build now:
https://github.com/mtsokol/array_record/actions/runs/16167375237

for MacOS arm64 it fails on import with:

+ /Users/runner/hostedtoolcache/Python/3.10.11/arm64/bin/python oss/test_with_grain.py
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
/tmp/array_record/oss/build_whl.sh: line 15: 12619 Abort trap: 6           $PYTHON_BIN oss/test_with_grain.py

Is it the error that you meant?

Copy link
Contributor Author

@mtsokol mtsokol Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something similar to: apache/arrow#40088, apache/arrow#40088 (comment)

Copy link
Contributor Author

@mtsokol mtsokol Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some research it looks like the reason for this mutex issue are different protobuf versions that ArrayRecord uses and Grain requires.

Grain requires protobuf 24.4 and ArrayRecord similarly, but the ArrayRecord was missing:

 write_to_bazelrc "common --check_direct_dependencies=error"

and the actual protobuf installed by ArrayRecord in the current main is 29.0.rc2 (!). I tried reducing versions in ArrayRecord but unfortunately eigen is holding on protobuf here, and tampering with its version leads to compilation errors.

I'm not sure what is the best option right now. The intuition tells me that ArrayRecord now requires 29.0 and the next release should uphold it and Grain should now also upgrade (But I remember that you said it's a painful process for Grain).
On the other hand dragging an ArrayRecord fork which is 35 commits behind also sounds cumbersome, especially that for conda-forge we need an official release that is available on github, we can't rely on a fork.

How painful would be to upgrade protobuf for Grain?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I also think this is because of the protobuf version. The problem here is not so much in grain, but tensorflow -- updating protobuf in grain is easy, but if both AR and Grain have higher protobuf version than TF, then import with TF will fail the same way. This is how we found out in the first place about this -- TF is not a dependency, but is often installed in the same ecosystem.

I think protobuf version was updated in TF main: tensorflow/tensorflow#92241. So now, hopefully with the next release of TF we can update it too.

Copy link
Contributor Author

@mtsokol mtsokol Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try solve this by using TensorFlow nightly in our test? Some in https://pypi.org/project/tf-nightly/#history should already use newer protobuf, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtsokol mtsokol force-pushed the build-refactor branch 3 times, most recently from 33913a7 to c9d4a11 Compare July 9, 2025 08:04
@mtsokol mtsokol force-pushed the build-refactor branch 2 times, most recently from 928465e to 36c467e Compare July 9, 2025 09:41
version = "0.7.3",
repo_name = "com_google_array_record",
)

Copy link
Contributor Author

@mtsokol mtsokol Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use actual bazel_dep(name = "protobuf", version = ...) here and enforce errors if different is picked:

write_to_bazelrc "common --check_direct_dependencies=error"

copybara-service bot pushed a commit that referenced this pull request Jul 9, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
copybara-service bot pushed a commit that referenced this pull request Jul 9, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
copybara-service bot pushed a commit that referenced this pull request Jul 9, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
Copy link
Contributor

@dryman dryman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a small document on how to run the new workflows. Thanks.

MODULE.bazel Outdated
ignore_root_user_error = True, # Required for our containerized CI environments.
python_version = PYTHON_VERSION,
http_archive(
name = "pybind11",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding an extra pybind11 rule here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I mostly followed Grain setup as that's the repo we're mainly working with: https://github.com/google/grain/blob/3bc1d2582dffdddf00b0d436a24b6313b69c617d/MODULE.bazel#L31
I can check it without it that one - and remove it if it also builds successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AR is using pybind11 through pybind_bazel already. So it should work without http_archive pulled pybind11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Removed pybind11 rule from here.

@mtsokol
Copy link
Contributor Author

mtsokol commented Jul 11, 2025

Please add a small document on how to run the new workflows. Thanks.

Done! I also enforced direct dependencies versions and fixed protobuf version to the one used by TF nightly: #162 (comment)

Here's a workflow with passing jobs: https://github.com/mtsokol/array_record/actions/runs/16217754006

@mtsokol mtsokol requested a review from dryman July 11, 2025 10:18
copybara-service bot pushed a commit that referenced this pull request Jul 14, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
copybara-service bot pushed a commit that referenced this pull request Jul 14, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
copybara-service bot pushed a commit that referenced this pull request Jul 14, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 781211931
copybara-service bot pushed a commit that referenced this pull request Jul 14, 2025
This PR cleans up the wheel build process for array_record and migrates changes present in https://github.com/iindyk/array_record/tree/main 20 commits ahead of main.

Here's a job that I ran for it: https://github.com/mtsokol/array_record/actions/runs/16147213833

Ported from #162

PiperOrigin-RevId: 783082302
@iindyk
Copy link
Collaborator

iindyk commented Jul 14, 2025

submitted in #163

@iindyk iindyk closed this Jul 14, 2025
@mtsokol mtsokol deleted the build-refactor branch July 15, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants