Skip to content

Conversation

davebayer
Copy link
Contributor

@ericniebler implemented some host standard library detection for using intrinsics for some simple functions such as move or forward.

I've improved the design a bit further, moved it to a separate header and added host standard library namespaces. The idea is that we could forward declare things like std::string instead of including the whole header to reduce compilation times.

The user would be required to include those headers if he wants to use APIs including these types.

Btw. _GLIBCXX_VERSION doesn't exist, I've changed it to __GLIBCXX__.

Copy link
Contributor

copy-pr-bot bot commented Oct 15, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Oct 15, 2025
@davebayer davebayer requested a review from miscco October 15, 2025 07:19
Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, recapitulating the internal discussion for posterity

@davebayer
Copy link
Contributor Author

/ok to test e1bcf4b

@davebayer
Copy link
Contributor Author

/ok to test af71337

@davebayer
Copy link
Contributor Author

/ok to test 0e45c13

@davebayer
Copy link
Contributor Author

/ok to test 81588a9

@davebayer
Copy link
Contributor Author

/ok to test 89b181a

@davebayer
Copy link
Contributor Author

/ok to test e17232f

@davebayer davebayer force-pushed the host_std_lib_detection branch from e17232f to 081bb70 Compare October 15, 2025 10:58
@davebayer davebayer marked this pull request as ready for review October 15, 2025 10:58
@davebayer davebayer requested review from a team as code owners October 15, 2025 10:58
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Oct 15, 2025
@davebayer davebayer force-pushed the host_std_lib_detection branch from 081bb70 to dd0b986 Compare October 15, 2025 11:03
Comment on lines 299 to 303
template <class _CharT, class _Traits, class _Tp = float>
::std::basic_istream<_CharT, _Traits>&
operator>>(::std::basic_istream<_CharT, _Traits>& __is, complex<__nv_bfloat16>& __x)
{
::std::complex<float> __temp;
::std::complex<_Tp> __temp;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to prevent errors about the type not being defined, but only forward declared

Comment on lines 161 to 162
template <class _Up = value_type>
_CCCL_HOST operator ::std::complex<_Up>() const
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again..

{
# if __cpp_lib_string_view >= 201606L
return __os << ::std::basic_string_view<_CharT>{__str};
return __os << ::std::basic_string_view<_CharT, ::std::char_traits<_CharT>>{__str};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we only forward declare the types, we don't have access to default template parameters, so we have to pass all of the template paramters here.

Comment on lines 309 to 314
template <class _CharT, class _Traits, class _Tp = float>
::std::basic_ostream<_CharT, _Traits>&
operator<<(::std::basic_ostream<_CharT, _Traits>& __os, const complex<__nv_bfloat16>& __x)
{
return __os << complex<float>{__x};
return __os << complex<_Tp>{__x};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should not be necessary because we are converting to a cuda::std::complex which is defined

operator>>(::std::basic_istream<_CharT, _Traits>& __is, complex<__nv_bfloat16>& __x)
{
::std::complex<float> __temp;
::std::complex<_Tp> __temp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could get away with the extended floating point types always casting to ::cuda::std::complex<float> and only do the smart thing for that

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Oct 15, 2025

This comment has been minimized.

This comment has been minimized.

@davebayer davebayer force-pushed the host_std_lib_detection branch from 9231bee to ce1d9df Compare October 16, 2025 06:14
@davebayer davebayer force-pushed the host_std_lib_detection branch from ce1d9df to 02ff513 Compare October 16, 2025 06:39
# endif
# endif // defined(_GLIBCXX_VERSION) || defined(_LIBCPP_VERSION) || defined(_MSVC_STL_VERSION)
#endif // defined(__cplusplus)
// todo: re-enable std builtins
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were not working anyway, so I'd like to leave this for a different PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to know what is not woorking. Generally I would just replace the host STL detection and leave the rest in

}
};

# if !defined(_LIBCUDACXX_HAS_NO_LOCALIZATION) && !_CCCL_COMPILER(NVRTC)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is never defined

This comment has been minimized.

@davebayer davebayer requested a review from miscco October 16, 2025 12:27
# endif
# endif // defined(_GLIBCXX_VERSION) || defined(_LIBCPP_VERSION) || defined(_MSVC_STL_VERSION)
#endif // defined(__cplusplus)
// todo: re-enable std builtins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to know what is not woorking. Generally I would just replace the host STL detection and leave the rest in

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Oct 16, 2025
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 6h 53m: Pass: 100%/127 | Total: 5d 18h | Max: 3h 20m | Hits: 42%/306565

See results here.

@davebayer davebayer merged commit 13886f7 into NVIDIA:main Oct 16, 2025
271 of 275 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants