Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grapheme_extract() returns false instead of empty string when the offset is equal to the length of the haystack #18031

Open
claudepache opened this issue Mar 12, 2025 · 6 comments

Comments

@claudepache
Copy link
Contributor

Description

The following code:

<?php
var_dump(grapheme_extract('', 42));
var_dump(grapheme_extract('ab', 42, offset: 2));

Resulted in this output:

bool(false)
bool(false)

But I expected this output instead:

string(0) ""
string(0) ""

Per documentation, the offset ought to be less then or equal to the length of the string (which I consider an appropriate range).

PHP Version

PHP 8.3

Operating System

No response

@youkidearitai
Copy link
Contributor

<?php
var_dump(grapheme_extract('', 42));
var_dump(intl_get_error_message());
var_dump(grapheme_extract('ab', 42, offset: 2));
var_dump(intl_get_error_message());

Result is below:

bool(false)
string(73) "grapheme_extract: start not contained in string: U_ILLEGAL_ARGUMENT_ERROR"
bool(false)
string(73) "grapheme_extract: start not contained in string: U_ILLEGAL_ARGUMENT_ERROR"

Because it does not point to a string. Therefore, returns false.

if ( lstart > INT32_MAX || lstart < 0 || (size_t)lstart >= str_len ) {
intl_error_set( NULL, U_ILLEGAL_ARGUMENT_ERROR, "grapheme_extract: start not contained in string", 0 );
RETURN_FALSE;
}

@devnexen
Copy link
Member

@youkidearitai do you think it only needs documentation changes ?

@claudepache
Copy link
Contributor Author

claudepache commented Mar 13, 2025

This may be considered as a documentation issue as grapheme_extract() has “always” worked that way. Although the current behaviour is counterintuitive when just extracting a valid prefix from a utf-8-encoded string, it has the property to always return a nonempty string or false provided that $size is positive, so that “looping until the result is false” may be part of a valid algorithm.

However, if documentation is changed, I plea that a big red warning is placed at the top of the man page.

@youkidearitai
Copy link
Contributor

do you think it only needs documentation changes ?

@devnexen Hmm... There doesn't seem to be a contradiction in document.

@claudepache
Copy link
Contributor Author

@devnexen The documentation says explicitly that offset: strlen($haystack) is a valid parameter value. However, in that case, there will always be a failure, and false is always returned. Of course, since the documentation does not say when failure occurs, there is no contradiction.

In short, the problem is not in what is documented, but in what is not documented. The documentation ought to specify when failure occurs, specially in cases where it contradicts user intuition.

@devnexen
Copy link
Member

do you think it only needs documentation changes ?

@devnexen Hmm... There doesn't seem to be a contradiction in document.

I do think a error section is a fair demand in this case, but that's just a personal opinion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants