Skip to content

lib/ExtUtils/t/Embed.t: failures on OpenBSD when compiling with gcc but not with clang #22125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jkeenan opened this issue Apr 6, 2024 · 27 comments · May be fixed by #23265
Open

lib/ExtUtils/t/Embed.t: failures on OpenBSD when compiling with gcc but not with clang #22125

jkeenan opened this issue Apr 6, 2024 · 27 comments · May be fixed by #23265
Assignees

Comments

@jkeenan
Copy link
Contributor

jkeenan commented Apr 6, 2024

For several months I have been noticing smoke-test failures under certain configurations on OpenBSD-7.4; see, e.g., report 5052514. The failures occur when configuring a perl built with these command-line switches:

$ sh ./Configure -des -Dusedevel -Dcc=gcc -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

... and then, after running make test_prep, calling:

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd - 
ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
collect2: ld returned 1 exit status

not ok 1
not ok 10 # system returned -1
Failed 10/10 subtests 

Test Summary Report
-------------------
../lib/ExtUtils/t/Embed.t (Wstat: 0 Tests: 2 Failed: 2)
  Failed tests:  1, 10
  Parse errors: Tests out of sequence.  Found (10) but expected (2)
                Bad plan.  You planned 10 tests but ran 2.
Files=1, Tests=2,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.48 cusr  0.11 csys =  0.62 CPU)
Result: FAIL

I didn't pay these much attention at first because at first the perl was being built with a compiler I had never heard of, eg++. But more recently these failures were occuring with gcc as the compiler.

I haven't been able to install an OpenBSD more recent than 6.9, but I decided to explore this problem on that version today regardless. I got the test failure reported above on blead. I then decided to see whether these failures were due to a recent code change or whether @cjg-cguevara's exploration of these config options had merely revealed a problem that was "always" there. I tested at tag v5.38.0 and got the failure reported above.

$ uname -mrs
OpenBSD 6.9 amd64

$ gcc --version | head -1
gcc (GCC) 4.2.1 20070719

$ git show | head -3
commit 76298ae68aa7796f0ffc05095b127d23f4b2de8f
Author: Ricardo Signes <[email protected]>
Date:   Sat Jul 1 20:48:27 2023 -0400

$ sh ./Configure -des -Dusedevel -Dcc=gcc -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'; make test_prep

$ ./perl -Ilib -V:config_args
config_args='-des -Dusedevel -Dcc=gcc -Accflags=-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS';

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd - 
ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
collect2: ld returned 1 exit status

not ok 1
not ok 10 # system returned -1
Failed 10/10 subtests 

Test Summary Report
-------------------
../lib/ExtUtils/t/Embed.t (Wstat: 0 Tests: 2 Failed: 2)
  Failed tests:  1, 10
  Parse errors: Tests out of sequence.  Found (10) but expected (2)
                Bad plan.  You planned 10 tests but ran 2.
Files=1, Tests=2,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.48 cusr  0.11 csys =  0.62 CPU)
Result: FAIL

Since the gcc version I was testing with dates back to 2007, I decided to try with the system's default C-compiler, clang-10.

$ clang --version                                                               >
OpenBSD clang version 10.0.1 
Target: amd64-unknown-openbsd6.9
Thread model: posix
InstalledDir: /usr/bin

$ sh ./Configure -des -Dusedevel -Dcc=clang -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'; make test_prep

With this more recent clang, the failing tests PASSed.

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd -

ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10 # system returned 0
ok
All tests successful.
Files=1, Tests=10,  1 wallclock secs ( 0.01 usr  0.02 sys +  0.55 cusr  0.36 csys =  0.94 CPU)
Result: PASS

It would be good if someone could test these config options on an up-to-date OpenBSD with both gcc and clang. We can then try to evaluate the source of the test failures. cc: @afresh1

@afresh1
Copy link
Contributor

afresh1 commented Apr 7, 2024

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case. Let me know if you need me to try something else or need access to an OpenBSD machine for testing.

zarniwoop$ sysctl kern.version
kern.version=OpenBSD 7.5-current (GENERIC.MP) #2098: Wed Apr  3 15:08:08 MDT 2024
    [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC.MP

zarniwoop$ gcc -v              
Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd7.5/4.2.1/specs
Target: sparc64-unknown-openbsd7.5
Configured with: OpenBSD/sparc64 system compiler
Thread model: posix
gcc version 4.2.1 20070719 
zarniwoop$ clang -v
OpenBSD clang version 16.0.6
Target: sparc64-unknown-openbsd7.5
Thread model: posix
InstalledDir: /usr/bin

@jkeenan
Copy link
Contributor Author

jkeenan commented Apr 7, 2024

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case.

And those were configured with -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' -- correct?
[snip]

gcc version 4.2.1 20070719

Is there some reason why OpenBSD's port of gcc remains at this version more than 16 years old (other than not wanting to use the GPL)? Should we even be bothering to test perls built with gcc on OpenBSD?

@jkeenan
Copy link
Contributor Author

jkeenan commented Apr 7, 2024

Noting the 2 instances of ld: error: undefined symbol ...:

$ cd t && \
gcc -o embed_test -I..  -std=gnu99 -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DDEBUG_LEAKING_SCALARS -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc; cd -

ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)
collect2: ld returned 1 exit status
[openbsd69: perl] $ ack -l '(Perl_more_sv)' .              
sv_inline.h
embed.h
proto.h
sv.c
[openbsd69: perl] $ ack -l '(PL_sv_serial)' .              
sv_inline.h
embedvar.h
makedef.pl
lib/Devel/PPPort.pm
...

Further analysis needs more C-foo than I have.

@jkeenan
Copy link
Contributor Author

jkeenan commented Apr 7, 2024

$ cd t && \
gcc -o embed_test -I..  -std=gnu99 -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DDEBUG_LEAKING_SCALARS -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc; cd -

I should add that, as the above invocation suggests, I configured with each of the two options separately and learned that the test failure is being generated by -DDEBUG_LEAKING_SCALARS. When I configure only with -DPERL_RC_STACK and build with gcc, the test PASSes.

@afresh1
Copy link
Contributor

afresh1 commented Apr 7, 2024

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case.

And those were configured with -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' -- correct? [snip]

Yes, I copied the incantation from the first message:

$ sh ./Configure -des -Dusedevel -Dcc=clang -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

Just changing =clang to =gcc between them.

gcc version 4.2.1 20070719

Is there some reason why OpenBSD's port of gcc remains at this version more than 16 years old (other than not wanting to use the GPL)? Should we even be bothering to test perls built with gcc on OpenBSD?

I believe we have some architectures still that clang doesn't support and we are stuck with gcc, but where possible it is becoming the system compiler and when stable, we stop installing gcc.

And yes, it is because the clang licence is more acceptable to OpenBSD. I'm fairly sure we will not bring any GPL3 code into the base system at all.

@iabyn
Copy link
Contributor

iabyn commented Apr 8, 2024 via email

@jkeenan
Copy link
Contributor Author

jkeenan commented Apr 8, 2024

Bisecting on OpenBSD-6.9 with this invocation:

perl Porting/bisect.pl \
-Dcc=gcc \
-Accflags='-DDEBUG_LEAKING_SCALARS' \
--start=4f1687891150ddeda14ad7b0716032145bc69801 \
--end=8b03aeb95ab72abdb2fa40f2d1196ce42f34708d \
--target lib/ExtUtils/t/Embed.t

... confirmed the breaking commit:

commit 75acd14e43f2ffb698fc7032498f31095b56adb5
Author:     Richard Leach <[email protected]>
AuthorDate: Sun Feb 6 22:52:54 2022 +0000
Commit:     Tomasz Konojacki <[email protected]>
CommitDate: Mon Mar 7 01:08:53 2022 +0100

    Make newSV_type an inline function

@jkeenan
Copy link
Contributor Author

jkeenan commented Apr 8, 2024

@richardleach ^^

@richardleach
Copy link
Contributor

Thanks, I'll try to look into this later this week

@richardleach richardleach self-assigned this Apr 9, 2024
@richardleach
Copy link
Contributor

S_new_SV() is weird: although it's static, it's not declared as inline, even although its defined in sv_inline.h. I don't know whether this is a mistake. It was in sv.c until 5.36.0. It was moved by this commit:

I'm assuming this is indeed an error by omission on my part, but haven't had chance to get an OpenBSD VM running yet.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 24, 2024

I had a look at this and reproduced it locally. libperl.a looks fine:

$ nm libperl.a | grep PL_sv_serial
...
00006738 B PL_sv_serial
...

So what do we end up linking against? I edited Embed.t to add -Wl,-t to the compiler invocation:

# eg++ -o embed_test -Wl,-t -I..  -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc
ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccIahN7A.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccIahN7A.o:(S_new_SV)
collect2: error: ld returned 1 exit status
# /usr/lib/crt0.o

# /usr/lib/crtbegin.o

# /tmp//ccIahN7A.o

# /usr/lib/libperl.so.23.0

# /usr/lib/libperl.so.23.0

# /usr/lib/libpthread.so.27.0

# /usr/lib/libutil.so.16.0

# /usr/local/lib/libestdc++.so.20.0

# /usr/lib/libm.so.10.1

# /usr/lib/libc.so.97.0

# /usr/lib/libc.so.97.0

# /usr/lib/crtend.o

not ok 1
# embed_test = ./embed_test
not ok 10 # system returned -1
Failed 10/10 subtests 

So we're linking against the system libperl, which of course wasn't built with -DDEBUG_LEAKING_SCALARS and doesn't define PL_sv_serial.

So while 75acd14 revealed this problem, it didn't cause the problem.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 24, 2024

I've tried a few approaches to this

  • making -L.. -lperl the only reference to libperl
  • using an absolute path for the -L used for libperl

The only way I could get it to link the correct libperl was a direct reference:

# eg++ -o embed_test -Wl,-t -Wl,--verbose -I..  -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c ../libperl.a   -Wl,-E  -fstack-protector-strong   -lpthread -lm -lutil -lc
# /usr/lib/crt0.o

# /usr/lib/crtbegin.o

# /tmp//cc13wQEa.o

# ../libperl.a(perl.o)

# ../libperl.a(op.o)

# ../libperl.a(universal.o)

# ../libperl.a(av.o)

...

I don't see a way to get ld to trace where it looks for libraries, so I don't see a way to trace this any further.

@richardleach
Copy link
Contributor

@tonycoz - Thanks for figuring out what was going on here re: linking.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 28, 2024

I need to look into it further, perl itself seems to link fine.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 29, 2024

I need to look into it further, perl itself seems to link fine.

The perl executable links the library via direct reference:

eg++ -o perl -Wl,-E  -fstack-protector-strong -L/usr/local/lib  perlmain.o   libperl.a `cat ext.libs` -lpthread -lm -lutil -lc

So I see two issues directly related to the test here:

  • the link process for embed_test doesn't match that of the perl executable, so we might link the wrong library in (and do in this case)
  • the test itself doesn't detect that we linked the wrong library, it's fairly simple and doesn't do anything that involves the layout of the perl structure

I can see a couple of issues not directly related to the failures:

  • S_new_SV should probably be PERL_STATIC_INLINE, or maybe better just defined in sv.c and exported, ie. E in embed.fnc, to allow simpler breakpoints
  • the S_new_SV file, line, func arguments should be PERL_ARG_UNUSED():
In file included from ../../perl.h:7870,
                 from B.xs:13:
../../sv_inline.h: In function 'SV* S_new_SV(const char*, int, const char*)':
../../sv_inline.h:75:28: warning: unused parameter 'file' [-Wunused-parameter]
   75 | S_new_SV(pTHX_ const char *file, int line, const char *func)
      |                ~~~~~~~~~~~~^~~~
../../sv_inline.h:75:38: warning: unused parameter 'line' [-Wunused-parameter]
   75 | S_new_SV(pTHX_ const char *file, int line, const char *func)
      |                                  ~~~~^~~~
../../sv_inline.h:75:56: warning: unused parameter 'func' [-Wunused-parameter]
   75 | S_new_SV(pTHX_ const char *file, int line, const char *func)

@richardleach
Copy link
Contributor

I can see a couple of issues not directly related to the failures:

I'm working on a PR for these

@jkeenan
Copy link
Contributor Author

jkeenan commented May 10, 2024

I can see a couple of issues not directly related to the failures:

I'm working on a PR for these

@richardleach @tonycoz:
Can we get an update on this? This broke in the previous development cycle.

@richardleach
Copy link
Contributor

I can see a couple of issues not directly related to the failures:

I'm working on a PR for these

@richardleach @tonycoz: Can we get an update on this? This broke in the previous development cycle.

As far as a PR for the "issues not directly related to the failures", I'm currently stuck on figuring out how to correctly declare - or otherwise make visible - the S_new_SV function and correct new_SV macro within sv_inline.h for that file's inline functions.

@richardleach
Copy link
Contributor

Ok, I cannot see how to make the S_new_SV function and newSV macro visible within sv_inline.h and also not cause redefinition warnings elsewhere, such as in .c files that import both sv_inline.h and embed.h. To make some progress, I've raised #22208 instead.

@tonycoz tonycoz self-assigned this May 13, 2024
@tonycoz
Copy link
Contributor

tonycoz commented May 13, 2024

This broke in the previous development cycle.

As far as I can tell the build issue itself is very long standing, unrelated to 75acd14 and can't be fixed in the C source.

lib/ExtUtils/t/Embed.t needs to be updated to build embed_test in the same way Makefile builds perl itself, but this late in the development process I don't think this should be done for 5.40, as it has a good chance of introducing similar failures on other platforms.

At base it looks like a bug in the old gcc on that platform, since the default clang works fine.

It could be added to known issues for the 5.40 perldelta.

@jkeenan
Copy link
Contributor Author

jkeenan commented Aug 9, 2024

This broke in the previous development cycle.

As far as I can tell the build issue itself is very long standing, unrelated to 75acd14 and can't be fixed in the C source.

lib/ExtUtils/t/Embed.t needs to be updated to build embed_test in the same way Makefile builds perl itself, but this late in the development process I don't think this should be done for 5.40, as it has a good chance of introducing similar failures on other platforms.

At base it looks like a bug in the old gcc on that platform, since the default clang works fine.

It could be added to known issues for the 5.40 perldelta.

@tonycoz, now that we're well into the 5.41 dev cycle, is there any way we could move work on this problem forward? Thanks.

@jkeenan
Copy link
Contributor Author

jkeenan commented Aug 31, 2024

This broke in the previous development cycle.

As far as I can tell the build issue itself is very long standing, unrelated to 75acd14 and can't be fixed in the C source.
lib/ExtUtils/t/Embed.t needs to be updated to build embed_test in the same way Makefile builds perl itself, but this late in the development process I don't think this should be done for 5.40, as it has a good chance of introducing similar failures on other platforms.
At base it looks like a bug in the old gcc on that platform, since the default clang works fine.
It could be added to known issues for the 5.40 perldelta.

@tonycoz, now that we're well into the 5.41 dev cycle, is there any way we could move work on this problem forward? Thanks.

@tonycoz, ping ^^

@jkeenan
Copy link
Contributor Author

jkeenan commented Sep 1, 2024

As https://perl5.test-smoke.org has come back to life (thanks to Leo, Olaf, et al.), I've been able to gather more data about this problem -- and am glad that the problem was partially solved by @richardleach's c79fe2b back in June. I noticed that after a string of 5 smoke-test run failures on my OpenBSD-6.9 VM between April 6 and June 11, no failures have been observed since. I bisected with the following invocation:

perl Porting/bisect.pl -Dcc=gcc \
    -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' \
--start a902d92a78 \
--end 01f2355587 \
--expect-fail \
--target lib/ExtUtils/t/Embed.t

... and found that the first "bad" (really, good, in this case) commit was 👍

c79fe2b42ae2a540552f87251aa0e36a060dd584 is the first bad commit
commit c79fe2b42ae2a540552f87251aa0e36a060dd584
Author: Richard Leach <[email protected]>
Date:   Sat May 11 13:26:27 2024 +0000
Commit:     Richard Leach <[email protected]>
CommitDate: Wed Jun 12 12:44:21 2024 +0100

    S_new_SV: args unused, static inline & defined, rename with Perl_ prefix

This build used gcc but was not a threaded build. Today I decided to confirm that I got a PASS on a threaded build. Today I configured with:

$ sh ./Configure -des -Dusedevel -Duseithreads -Dcc=gcc

... then built and tested.

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd -                 
ld: error: undefined symbol: Perl_croak_nocontext
>>> referenced by embed_test.c
>>>               /tmp//ccTDHCnn.o:(Perl_croak_memory_wrap)
collect2: ld returned 1 exit status

not ok 1
not ok 10 # system returned -1
Failed 10/10 subtests 

Test Summary Report
-------------------
../lib/ExtUtils/t/Embed.t (Wstat: 0 Tests: 2 Failed: 2)
  Failed tests:  1, 10
  Parse errors: Tests out of sequence.  Found (10) but expected (2)
                Bad plan.  You planned 10 tests but ran 2.
Files=1, Tests=2,  1 wallclock secs ( 0.00 usr  0.05 sys +  0.45 cusr  0.18 csys =  0.68 CPU)
Result: FAIL
Finished test run at Sun Sep  1 12:11:21 2024.

Note that the very first entry in this issue back in April was reporting an unthreaded build.

I'm hoping that we can get some smoke-tests on more up-to-date OpenBSDs (7.5) with more modern gccs, unthreaded and threaded, with and without -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' so that we can get a better handle on this problem.

@tonycoz
Copy link
Contributor

tonycoz commented Sep 2, 2024

Note that c79fe2b didn't fix the underlying issue: linking with the wrong libperl.

I've spent some time on this, mostly trying to work out how to detect the mismatch so the test fails when it should fail.

tonycoz pushed a commit to tonycoz/perl5 that referenced this issue Dec 10, 2024
Issue Perl#22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz pushed a commit that referenced this issue Dec 12, 2024
Issue #22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz pushed a commit that referenced this issue Dec 12, 2024
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes #22125
tonycoz added a commit that referenced this issue Dec 12, 2024
Issue #22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz added a commit that referenced this issue Dec 12, 2024
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes #22125
@bulk88
Copy link
Contributor

bulk88 commented Dec 15, 2024

PERL_CALLCONV void
Perl_api_version_check(size_t interp_size, void *v_my_perl, const char *api_version)
        __attribute__visibility__("hidden");
#define PERL_ARGS_ASSERT_API_VERSION_CHECK      \
        assert(api_version)

Some improvements, bc this is a new API/fnc, without requiring CV function ptr compatibility like the XS version has.

Some suggestions, void->const char * or ->UV (???). If this api is for root binaries/embedders "looking down" on perl. Role reversal of DynaLoader. I would prefer a const char * or int error code. Not a longjmp bomb thrown at me from what I called. If the child libperl is the wrong version/ABI, what business does it have thrown me an exception, while that libperl is in a pre-Perl_sys_intern_init() state?

Passing a const char */int error to the caller/root binary is more nice, so the caller (assume no human) can tuck the error msg away in some log. This situation is a little unrealistic, since an embedded-perl app, isnt going to be uploaded to CI/published to customers, if it doesnt even start on the author's box. So getting the error code with a C debugger after a SEGV, or getting it in the console, is the same thing, since the end user has (lets hope) to have basic C debugging tools before trying to make a embedded libperl app. The new function needs a better way to communicate to the caller than longjmp().

I can imagine this FN being useful in some kind of current or future git bisect like setup, with 100 libperl.so/dll'es in one directory.

@tonycoz
Copy link
Contributor

tonycoz commented Dec 16, 2024

Some suggestions, void->const char * or ->UV (???). If this api is for root binaries/embedders "looking down" on perl. Role reversal of DynaLoader. I would prefer a const char * or int error code. Not a longjmp bomb thrown at me from what I called. If the child libperl is the wrong version/ABI, what business does it have thrown me an exception, while that libperl is in a pre-Perl_sys_intern_init() state?

Perhaps separate *_assert and *_check functions. Though the assert version should probably do a hard exit instead of a croak.

The problem with returning a const char * is what is it pointing at? I think we'd need to pass in a buffer (with a minimum size) that the check version could fill.

That said, the attempted fixes to the Embed.t linking didn't work, and the new check has revealed a similar problem on Solaris, yay.

__attribute__visibility__("hidden");

or maybe this broke it.

@bulk88
Copy link
Contributor

bulk88 commented Jan 20, 2025

Perhaps separate *_assert and *_check functions. Though the assert version should probably do a hard exit instead of a croak.

Perl has gotten CVE s before where malloc() ret NULL doesn't SEGV or doesn't hard process exit, b/c Perl_croak() was used to dispatch the error and there was a eval {} block, in the way. Even p5's do_exit() my_exit() and p5 C api disabling setjmp longjmp, is still a security problem. It's trivial for the last two to start calling back into PP code with sub DESTROY and mgset mgget mgfree and overload.pm.

I haven't had the time to write a real poc but I have played around with randomly breaking/forcing thru hook malloc() =null and the temps stack and HV head stack grow reallocators, failing. Yes perl worked perfectly upto 2^32 - exactly 4096 bytes, of OS malloc memory, and the very last VM page, was wanted for a SV head arena, pl stk or tmps stk expand.

XS handshake has a die_no_perl("xyz") for the very specific reason that I realized that there will be endless problems/segvs dispatching the error string back to the stderr console, through any existing Perl API.

Malloc()? Unavailable cuz we just detected someone commited arson on malloc()s internals.

SPrintf()? Format strings? Crazy 🤪 talk we're reporting the local serialization mutex timed out deadlicked, how you think you're going to reenter it?

PerlIO? I dare you to find a code path to reach ring-0 syswrite() through that api, after I demapped glibc's NLS mmaped DB backend from the process.

Same for fprintf(), I think I used fwrite() in die_no_perl() because it was the most stable and most portable way to write to the console when I single stepped in assembly code all the way to NtExitProcess(exitcode).

Reaching Linux or OSX kernel with public api in assembly code is beyond scope for too much work for too few failure modes. Just assume 95% of Libc is broken, not that Libc was removed from virtual memory, and the day job work ticket says to report failure mode to stderr to user of Libc disappearing randomly from virtual memory without a visible segv.

The problem with returning a const char * is what is it pointing at? I think we'd need to pass in a buffer (with a minimum size) that the check version could fill.

C strings allocated as constant global in the bad child lib perl sound safe enough. The collar cannot be that stupid writing and embedded Pearl program to identify trap resume the ABI mismatch of lipearl.so, demap the bad child libperl.so, continue to search on some list of disc paths or Brute Force the directory tree for another libperl.so then at process exit dump the log to disk and crash bc of earlier demapped libperl.so. If someone knows how to unload a libperl.so from virtual address space they will know how to use string length and mem copy.

A string is a horrible format though for error handling since it's not machine parsible only human parsible.

I have brainstormed that XS embed ABI handshake really needs 4 critical data fields.

U16 or U8 X 3 for 41 , 8, 0x_alpha, NO U16 for 0x5 !!!

U32 X 2 for non_bin_compat_options bit vector from Perl -V, but that list of defines is absolute trash by now and I have a half of bug ticket for half the items in there proving they will crash or change my Pearl size if those defines are different
Non bin compat sits at 33 defines rn, so it has to be transported as 2 u32s.

Myperl size, there are some things that cannot be detected by Configure and aren't even a perl core option like the internals of stat_t that can change with a "simple monthly security update" to GCC or MSVC silently installed on a user's system, and for who knows what reason why that CC update broke Global ABI state.

" I see you used the LTS package manager on a non LTS Enabled Ubuntu that was EOLed, here is the phone number for our sales dept for Enterprise VIP customers"

[we break ABI API on purpose for same age same version same build date same build number, OSes, but one ISO is tagged LTS [corporate sales] and the other is tagged "consumer"/mainline]

tonycoz added a commit that referenced this issue Apr 1, 2025
Issue #22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz added a commit that referenced this issue Apr 1, 2025
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes #22125
tonycoz added a commit that referenced this issue Apr 7, 2025
Issue #22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz added a commit that referenced this issue Apr 7, 2025
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes #22125
tonycoz added a commit to tonycoz/perl5 that referenced this issue Apr 15, 2025
Issue Perl#22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz added a commit to tonycoz/perl5 that referenced this issue Apr 15, 2025
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes Perl#22125
tonycoz added a commit to tonycoz/perl5 that referenced this issue May 7, 2025
Issue Perl#22125 detected that we weren't linking the correct library with
the embedded test with gcc on OpenBSD, so add an API to perform a
sanity check by comparing the size of the perl interpreter
structure (or its size if it was a structure) and expected perl API
version between those seen in the binary and those compiled into
libperl.
tonycoz added a commit to tonycoz/perl5 that referenced this issue May 7, 2025
When building with gcc, lib/ExtUtils/t/Embed.t would link against
the system libperl.a rather than the newly built libperl.a.

Due to the limited API used by the sample code this typically didn't
crash, but some configuration changes could result in a crash or a
link error.

For OpenBSD, change the link options to more closely match those
used when building the perl executable, which results in linking
against the correct library.

Fixes Perl#22125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants