Skip to content

ACP: Pattern methods for OsStr without OsStr patterns #311

Closed
@epage

Description

@epage

Proposal

Problem statement

With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on OsStr but it requires unsafe to get an OsStr back, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.

RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.

Motivating examples or use cases

Mostly copied from #306

Argument parsers need to extract substrings from command line arguments. For example, --option=somefilename needs to be split into option and somefilename, and the original filename must be preserved without sanitizing it.

clap currently implements strip_prefix and split_once using transmute (equivalent to the stable encoded_bytes APIs).

The os_str_bytes and osstrtools crates provides high-level string operations for OS strings. os_str_bytes is in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3). osstrtools enables reasonable uses of split() to parse $PATH and replace() to fill in command line templates.

Solution sketch

Provide strs Pattern-accepting methods on &OsStr.

Defer out OsStr being used as a Pattern and OsStr indexing support which are specified in RFC #2295.

Example of methods to be added:

impl OsStr {
    pub fn contains<'a, P>(&'a self, pat: P) -> bool
    where
        P: Pattern<&'a Self>;

    pub fn starts_with<'a, P>(&'a self, pat: P) -> bool
    where
        P: Pattern<&'a Self>;

    pub fn ends_with<'a, P>(&'a self, pat: P) -> bool
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn find<'a, P>(&'a self, pat: P) -> Option<usize>
    where
        P: Pattern<&'a Self>;

    pub fn rfind<'a, P>(&'a self, pat: P) -> Option<usize>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    // (Note: these should return a concrete iterator type instead of `impl Trait`.
    //  For ease of explanation the concrete type is not listed here.)
    pub fn split<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>;

    pub fn split_inclusive<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>;

    pub fn rsplit<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn split_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>;

    pub fn rsplit_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn splitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>;

    pub fn rsplitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn split_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
        P: Pattern<&'a Self>;

    pub fn rsplit_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
        P: Pattern<&'a Self>;

    pub fn matches<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>;

    pub fn rmatches<'a, P>(&self, pat: P) -> impl Iterator<Item = &'a Self>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn match_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
    where
        P: Pattern<&'a Self>;

    pub fn rmatch_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn trim_matches<'a, P>(&'a self, pat: P) -> &'a Self
    where
        P: Pattern<&'a Self>,
        P::Searcher: DoubleEndedSearcher<&'a Self>;

    pub fn trim_start_matches<'a, P>(&'a self, pat: P) -> &'a Self
    where
        P: Pattern<&'a Self>;

    pub fn strip_prefix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
    P: Pattern<&'a Self>;

    pub fn strip_suffix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
    P: Pattern<&'a Self>;

    pub fn trim_end_matches<'a, P>(&'a self, pat: P) -> &'a Self
    where
        P: Pattern<&'a Self>,
        P::Searcher: ReverseSearcher<&'a Self>;

    pub fn replace<'a, P>(&'a self, from: P, to: &'a Self) -> Self::Owned
    where
        P: Pattern<&'a Self>;

    pub fn replacen<'a, P>(&'a self, from: P, to: &'a Self, count: usize) -> Self::Owned
    where
        P: Pattern<&'a Self>;
}

impl Pattern<&OsStr> for char {}
impl Pattern<&OsStr> for &str {}
impl Pattern<&OsStr> for &String {}
impl Pattern<&OsStr> for &[char] {}
impl Pattern<&OsStr> for &&str {}
impl<const N: usize> Pattern<&OsStr> for &[char; N] {}
impl<F: FnMut(char) -> bool> Pattern<&OsStr> for F {}
impl<const N: usize> Pattern<&OsStr> for [char; N] {}
  • This is meant to match str and if there are any changes between the writing of this ACP and implementation, the focus should be on what str has at the time of implementation (e.g. not adding a deprecated variant but the new one)
  • We likely want to add trim, trim_start, and trim_end to be consistent with trim_start_matches / trim_end_matches
  • for more details, see Add pattern matching API to OsStr rust#109350

This should work because

From an API design perspective, there is strong precedence for it

  • Its copying methods over from str
  • The design is a subset of RFC #2295 (approved) and RFC #1309 (postponed)
    • By deferring support for OsStr as a pattern, we bypass the main dividing point between proposals (split APIs, panic on unpaired surrogates, switching away from WTF-8)

Alternatives

#306 proposes a OsStr::slice_encoded_bytes

  • Still requires writing higher level operations on top, but at least its without unsafe
  • Either takes a performance hit to be consistent across platforms or has per-platform caveats that will be similarly hard to get right for less common platforms among developers (e.g. Windows)
  • As far as I can tell, there isn't precedence for an API design like this meaning more new ground has to be set (naming, deciding the above preconditions, etc)

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)I-libs-api-nominatedIndicates that an issue has been nominated for discussion during a team meeting.T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions