Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44607: [C++][Dev] Update bundled Thrift, update mirrors to use CDN #44685

Merged
merged 10 commits into from
Nov 11, 2024

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Nov 9, 2024

Rationale for this change

Builds with bundled Thrift could fail because of permanently offline download mirror URLs. ASF now provides a CDN.

What changes are included in this PR?

  1. Updates the pinned version for bundled Thrift from 0.16.0 to 0.20.0. We couldn't just update the URLs because the ASF CDN only hosts the last two versions of Thrift (0.20, 0.21.0). We didn't upgrade to Thrift 0.21.0 in this PR because it requires Boost 1.86.0 and we found incompatibilities with Boost 1.86.0 and cpp/build-support/trim-boost.sh which we use to publish a trimmed-down distribution of Boost for our build system to use.
  2. Removes the set of URLs we were using to download Thrift in favor of just using the main CDN URL. We'll see how this works in practice.

Are these changes tested?

Yes. I configured a build with ...-DThrift_SOURCE=BUNDLED... and ran the full C++ test suite. I think further testing can be done in CI.

Are there any user-facing changes?

No.

Closes #44607

@github-actions github-actions bot added the awaiting review Awaiting review label Nov 9, 2024
@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

@github-actions crossbow submit test-r-linux-as-cran

Copy link

github-actions bot commented Nov 9, 2024

Revision: 7173f31

Submitted crossbow builds: ursacomputing/crossbow @ actions-c90acec2ee

Task Status
test-r-linux-as-cran GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Nov 9, 2024

Revision: 77d7c31

Submitted crossbow builds: ursacomputing/crossbow @ actions-da415e2d2d

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Nov 9, 2024

Revision: 2bbe2aa

Submitted crossbow builds: ursacomputing/crossbow @ actions-4c1b909295

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

Windows MSVC is failing with :

D:\a\arrow\arrow\build\cpp\thrift_ep-prefix\src\thrift_ep\lib\cpp\src\thrift\TUuid.cpp(22): fatal error C1083: Cannot open include file: 'boost/uuid/string_generator.hpp': No such file or directory

rhub/ubuntu-gcc-12 is failing with:

/tmp/Rtmpa7sH8K/file73956317ada/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/TUuid.cpp:22:10: fatal error: boost/uuid/string_generator.hpp: No such file or directory
22 | #include <boost/uuid/string_generator.hpp>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

The two failing builds were both bundling Boost and Thrift and I was able to reproduce the failure locally. From looking around, it looks like you have to upgrade to Boost 1.86.0 if you use Thrift 0.21.0. To test that hypothesis, I lowered Thrift down to 0.20.0 in 4886ad7 to test. I think we can get away with bumping Boost to 1.86.0 and Thrift to 0.21.0 in this PR so I'll try that once CI finishes.

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

With Boost 1.86.0 and Thrift 0.21.0, the MSVC build fails with:

2024-11-09T17:43:07.4309013Z [550/608] Linking CXX shared library debug\parquet.dll
2024-11-09T17:43:07.4310713Z FAILED: debug/parquet.dll debug/parquet.lib 
2024-11-09T17:43:07.4323359Z C:\Windows\system32\cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=src\parquet\CMakeFiles\parquet_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\link.exe /nologo src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_5_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_4_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_3_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_2_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_1_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_0_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\__\generated\parquet_constants.cpp.obj src\parquet\CMakeFiles\parquet_shared.dir\__\generated\parquet_types.cpp.obj src\parquet\CMakeFiles\parquet_shared.dir\level_comparison_avx2.cc.obj src\parquet\CMakeFiles\parquet_shared.dir\level_conversion_bmi2.cc.obj  /out:debug\parquet.dll /implib:debug\parquet.lib /pdb:debug\parquet.pdb /dll /version:1900.0 /machine:x64  /NODEFAULTLIB:LIBCMT /debug /INCREMENTAL -Wl,--version-script=D:/a/arrow/arrow/cpp/src/parquet/symbols.map  debug\arrow.lib  "C:\Program Files\OpenSSL\lib\VC\libcrypto64MDd.lib"  "C:\Program Files\OpenSSL\lib\VC\libssl64MDd.lib"  thrift_ep-install\bin\thriftmd.lib  "C:\Program Files\OpenSSL\lib\VC\libcrypto64MDd.lib"  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
2024-11-09T17:43:07.4347738Z LINK Pass 1: command "C:\PROGRA~2\MICROS~2\2019\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\link.exe /nologo src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_5_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_4_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_3_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_2_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_1_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\Unity\unity_0_cxx.cxx.obj src\parquet\CMakeFiles\parquet_shared.dir\__\generated\parquet_constants.cpp.obj src\parquet\CMakeFiles\parquet_shared.dir\__\generated\parquet_types.cpp.obj src\parquet\CMakeFiles\parquet_shared.dir\level_comparison_avx2.cc.obj src\parquet\CMakeFiles\parquet_shared.dir\level_conversion_bmi2.cc.obj /out:debug\parquet.dll /implib:debug\parquet.lib /pdb:debug\parquet.pdb /dll /version:1900.0 /machine:x64 /NODEFAULTLIB:LIBCMT /debug /INCREMENTAL -Wl,--version-script=D:/a/arrow/arrow/cpp/src/parquet/symbols.map debug\arrow.lib C:\Program Files\OpenSSL\lib\VC\libcrypto64MDd.lib C:\Program Files\OpenSSL\lib\VC\libssl64MDd.lib thrift_ep-install\bin\thriftmd.lib C:\Program Files\OpenSSL\lib\VC\libcrypto64MDd.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:src\parquet\CMakeFiles\parquet_shared.dir/intermediate.manifest src\parquet\CMakeFiles\parquet_shared.dir/manifest.res" failed (exit code 1104) with the following output:
2024-11-09T17:43:07.4362016Z LINK : warning LNK4044: unrecognized option '/Wl,--version-script=D:/a/arrow/arrow/cpp/src/parquet/symbols.map'; ignored
2024-11-09T17:43:07.4363542Z LINK : fatal error LNK1104: cannot open file 'thrift_ep-install\bin\thriftmd.lib'

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

It looks like our build may be conflicting with a change in Thrift that appeared in 0.21.0:

https://github.com/apache/thrift/blob/b65ec607db09f2efdb7397acde88fd53cfb97f6b/build/cmake/DefineInstallationPaths.cmake#L23-L29

# For MSVC builds, install shared libs to bin/, while keeping the install
# dir for static libs as lib/.
if(MSVC AND BUILD_SHARED_LIBS)
    set(LIB_INSTALL_DIR "bin${LIB_SUFFIX}" CACHE PATH "The library install dir (default: bin${LIB_SUFFIX})")
else()
    set(LIB_INSTALL_DIR "lib${LIB_SUFFIX}" CACHE PATH "The library install dir (default: lib${LIB_SUFFIX})")
endif()

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Nov 9, 2024

Revision: 88601bc

Submitted crossbow builds: ursacomputing/crossbow @ actions-46a48b959a

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Nov 9, 2024

@assignUser this is ready for a review. Three of the four failed crossbow jobs looks like test flakiness but another set of eyes would be good. The test-debian-12-cpp-amd64 fails due to a related issue but for ORC. I think I could do a follow-up PR to update those URLs after this one.

Edit: On the last point, I think maybe we can hold off on updating the ORC mirror URLs. I looked closer and it looks like the runner VM had a more fundamental networking issue at the time so maybe we can just keep an eye on this.

"ARROW_BOOST_URL boost-${ARROW_BOOST_BUILD_VERSION}.tar.gz https://apache.jfrog.io/artifactory/arrow/thirdparty/7.0.0/boost_${ARROW_BOOST_BUILD_VERSION//./_}.tar.gz"
"ARROW_BOOST_URL boost-${ARROW_BOOST_BUILD_VERSION}.tar.gz https://archives.boost.io/release/1.86.0/source/boost_${ARROW_BOOST_BUILD_VERSION//./_}.tar.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a trimmed down version of boost (10mb vs 142mb) to speed up downloads. Could you create the new archive and upload it to the artifactory (as a committer you can do that) and use that version here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I just tried to run the script and am getting failures on the bootstrap/b2 step. I'll take a look later today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have any luck and it appears to be caused by Boost 1.86.0 releasing with a broken bcp. See comment and really the whole thread at boostorg/bcp#18 (comment).

I think we could probably update the trim-boost script to work around this but we could also just switch back to Thrift 0.20.0 and Boost 1.81.0 until Boost 1.87.0.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 10, 2024
@amoeba
Copy link
Member Author

amoeba commented Nov 11, 2024

Due to the bcp issue with boost 1.86.0, I bumped Thrift down to 0.20.0 and Boost back to 1.81.0. We can update to Thrift 0.21.0 once Boost 1.87.0 is out (and we can confirm the included bcp works).

@amoeba
Copy link
Member Author

amoeba commented Nov 11, 2024

CI passed and PR description updated. @assignUser I think we can merge this without re-running all of the crossbow jobs and just be mindful during the 18.1.0 release process to look out for any issues.

Copy link
Member

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍 feel free to merge

@@ -1775,6 +1763,7 @@ macro(build_thrift)
set(THRIFT_LIB_SUFFIX "md")
list(APPEND THRIFT_CMAKE_ARGS "-DWITH_MT=OFF")
endif()
# NOTE(amoeba): When you bump Thrift to >=0.21.0, change bin to lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@amoeba amoeba merged commit d7e982c into apache:main Nov 11, 2024
40 checks passed
@amoeba amoeba removed the awaiting committer review Awaiting committer review label Nov 11, 2024
@amoeba
Copy link
Member Author

amoeba commented Nov 11, 2024

Merged, thanks.

@amoeba amoeba mentioned this pull request Nov 11, 2024
2 tasks
amoeba added a commit that referenced this pull request Nov 11, 2024
…#44685)

### Rationale for this change

Builds with bundled Thrift could fail because of permanently offline download mirror URLs. ASF now provides [a CDN](https://dlcdn.apache.org/).

### What changes are included in this PR?

1. Updates the pinned version for bundled Thrift from 0.16.0 to 0.20.0. We couldn't just update the URLs because the ASF CDN only hosts the last two versions of Thrift (0.20, 0.21.0). We didn't upgrade to Thrift 0.21.0 in this PR because it requires Boost 1.86.0 and we found incompatibilities with Boost 1.86.0 and `cpp/build-support/trim-boost.sh` which we use to publish a trimmed-down distribution of Boost for our build system to use.
2. Removes the set of URLs we were using to download Thrift in favor of just using the main CDN URL. We'll see how this works in practice.

### Are these changes tested?

Yes. I configured a build with `...-DThrift_SOURCE=BUNDLED...` and ran the full C++ test suite. I think further testing can be done in CI.

### Are there any user-facing changes?

No.

Closes #44607
* GitHub Issue: #44607

Authored-by: Bryce Mecum <[email protected]>
Signed-off-by: Bryce Mecum <[email protected]>
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d7e982c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Thrift mirrors are outdated
2 participants