Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions proposals/p6177.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# C++ Interop: Mapping `std::string_view` to `Core.Str`

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/6177)

<!-- toc -->

## Table of contents

- [Abstract](#abstract)
- [Problem](#problem)
- [Background](#background)
- [Proposal](#proposal)
- [Details](#details)
- [Rationale](#rationale)
- [Alternatives considered](#alternatives-considered)
- [1. Provide a Wrapper Type](#1-provide-a-wrapper-type)
- [2. Do Nothing](#2-do-nothing)

<!-- tocstop -->

## Abstract

This proposal defines a direct, zero-cost mapping between C++'s
`std::string_view` and Carbon's `Core.Str` for C++ interoperability. The goal is
to make C++ APIs that use `std::string_view` feel native and seamless when used
from Carbon. This mapping relies on the two types having an identical memory
representation, a condition that we will work to ensure across all supported
platforms.

## Problem

Seamless interoperability with C++ is a core goal for Carbon. `std::string_view`
has become a ubiquitous, fundamental type in modern C++ for passing non-owning
string data. Without a direct mapping, Carbon developers would be forced to work
with ABI-incompatible wrapper types or manually unpack `std::string_view`
instances into pointers and sizes.

This creates significant friction:

1. **Ergonomics:** C++ APIs would not feel idiomatic. Developers would need to
perform constant, boilerplate conversions.
2. **Performance:** Any wrapper-based solution would break the zero-cost
abstraction principle, potentially introducing overhead at the boundary.
3. **Adoption:** The lack of seamless integration for such a basic type would
be a significant barrier to migrating or integrating C++ codebases.

To provide a truly smooth migration path and interoperability story, Carbon must
treat `std::string_view` as a first-class citizen, ideally as the same type as
its native string view.

## Background

Carbon's `Core.Str` is, per
[#5969](https://github.com/carbon-language/carbon-lang/issues/5969), the
language's fundamental non-owning view of a sequence of bytes. It is
[currently implemented as a pair of a pointer to the data and a 64-bit integer representing the size in bytes](https://github.com/carbon-language/carbon-lang/blob/7c13bddc92be8ceac758189df76ebbb048e1a9d5/core/prelude/types/string.carbon#L19-L22).
The assumed memory layout is the pointer followed by the size.

C++'s `std::string_view` serves the same purpose. However, its memory layout is
not standardized and varies between standard library implementations. This is
critical for ABI compatibility.

The layouts for major C++ standard library implementations are as follows:

| Standard Library | Platform/Compiler | Member Order | Size Type (`size_t`) | Notes | Source |
| ---------------- | -------------------- | -------------------- | -------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| **libc++** | Clang (macOS, etc.) | `__data_`, `__size_` | 64-bit (on 64-bit) | Pointer first, then size. | [`string_view`](https://github.com/llvm/llvm-project/blob/fd5bc6033e521b946f04cb9c473d9cca3da2da9b/libcxx/include/string_view#L711-L712) |
| **MSVC STL** | Microsoft Visual C++ | `_Mydata`, `_Mysize` | 64-bit (on 64-bit) | Pointer first, then size. | [`__msvc_string_view.hpp`](https://github.com/microsoft/STL/blob/ba64eaaa8592c700949f3c09a0d8570b932828f5/stl/inc/__msvc_string_view.hpp#L1924-L1925) |
| **libstdc++** | GCC (Linux, etc.) | `_M_len`, `_M_str` | 64-bit (on 64-bit) | **Size first, then pointer.** | [`string_view`](https://github.com/gcc-mirror/gcc/blob/6b999bf40090f356c5bb5ff8a82e7e0dc4c4ae05/libstdc%2B%2B-v3/include/std/string_view#L590-L591) |

As the table shows, there is a key difference in member ordering, with
`libstdc++` being the outlier compared to the assumed layout of `Core.Str`.

## Proposal

We propose to map C++ `std::string_view` directly to Carbon's `Core.Str` when
importing C++ headers. This means the Carbon compiler will treat
`std::string_view` as a type alias for `Core.Str` at the ABI level.

To achieve this, the following conditions must be met:

1. **Identical Representation:** The memory layout (sequence of fields, size,
and alignment) of `std::string_view` and `Core.Str` must be identical for
the target platform and C++ standard library.
2. **Platform-Wide Compatibility:** The ultimate goal is for this mapping to
work seamlessly across all Carbon-supported architectures.

Initially, this direct mapping will be enabled only for targets where the
representation is known to match. For other platforms, we will pursue one of two
strategies:

- Work with standard library vendors (for example, `libstdc++`) to align on a
consistent representation for `std::string_view` across architectures,
falling back to providing a patched version of these libraries if necessary.
- Adapt the memory layout of Carbon's `Core.Str` on a per-platform basis to
match the target's native `std::string_view` ABI.

This ensures that while the initial implementation may be constrained, the
long-term design is for universal, zero-cost compatibility.

## Details

The initial implementation will assume `Core.Str` has a `(pointer, size)`
layout. This means the direct mapping will work out-of-the-box on platforms
using `libc++` (Clang) and MSVC STL.

For platforms using `libstdc++`, the current `(size, pointer)` layout is
incompatible. The direct mapping will be disabled on these platforms by default
until compatibility is achieved. Our strategy is to first align `Core.Str`'s
layout with the dominant `(pointer, size)` convention used by Clang and MSVC. We
will then engage with the `libstdc++` community to explore standardizing this
layout for better cross-compiler compatibility.

Furthermore, `Core.Str` is defined with a 64-bit size field. C++
`std::string_view` uses `size_t`, which is 32-bit on 32-bit targets. Therefore,
this direct mapping will initially be restricted to 64-bit targets, which are
Carbon's primary focus.

It is essential to recognize that both `Core.Str` and `std::string_view` are
fundamentally views over bytes, not Unicode characters. This proposal maintains
that semantic alignment. If Carbon requires a string type that understands
character boundaries, it should be a separate, distinct type in the standard
library and not interfere with this fundamental C++ interoperability mechanism.

## Rationale

This proposal directly supports the following Carbon goals:

- **Interoperability with and migration from existing C++ code:**
`std::string_view` is one of the most common types in modern C++ interface
design. A seamless mapping is not a luxury but a requirement for effective
interoperability.
- **Performance-critical software:** By ensuring a direct, zero-cost mapping,
we avoid any performance penalties at the C++/Carbon boundary for string
data. This is critical for systems programming where such overhead is
unacceptable.
- **Code that is easy to read, understand, and write:** A direct mapping
allows developers to think of `std::string_view` and `Core.Str` as the same
concept, reducing cognitive load and eliminating the need for manual
conversions.
- **Naming of `Core.Str`:** The choice of `Str` over `String` for the
non-owning view type is intentional. It avoids confusion with owning types
like C++'s `std::string`. `Str` is introduced as a new term of art for
Carbon, providing a concise and readable name for this fundamental type. The
shorter name is preferred for its clarity and reduced verbosity, especially
for a type that will be used frequently. This is based on the decision in
leads question
[#5969](https://github.com/carbon-language/carbon-lang/issues/5969).

By defining a clear path toward universal ABI compatibility for this type, we
are building a solid foundation for deep and performant integration with the
existing C++ ecosystem.

## Alternatives considered

### 1. Provide a Wrapper Type

We could choose to always import `std::string_view` as an opaque Carbon struct,
for example, `Core.Cpp.string_view`.

- **Advantages:** This would be ABI-safe on all platforms immediately.
- **Disadvantages:** This approach is not seamless. It would require explicit
conversions between `Core.Str` and `Core.Cpp.string_view`, adding
boilerplate and potential performance overhead. It violates the goal of
making C++ APIs feel native to Carbon.

### 2. Do Nothing

We could leave it to the developer to manually handle `std::string_view` by
accepting it as an opaque type and using C++ helper functions to extract the
pointer and size.

- **Advantages:** Simplest to implement in the compiler.
- **Disadvantages:** This provides a terrible developer experience and runs
directly counter to Carbon's core goal of excellent C++ interoperability. It
would make using a vast number of modern C++ libraries prohibitively
difficult.

The proposed approach of a direct mapping is superior as it prioritizes the
long-term goals of performance and ergonomics, even if it requires a phased
implementation to achieve full platform support.
Loading