Skip to content

Commit 39b4841

Browse files
committed
Fix crash on recursive /(?{...})/ call
In something like my $pat = /... (?{ foo() if ...; }) .../; sub foo { $string =~ $pat; } foo(); perl would SEGV or produce the wrong values for $1 et al. This commit fixes that. Background: For a compile-time pattern like $foo =~ /..../, the pattern is compiled at compile time, and the resulting REGEXP SV is stored as a pointer in the OP_MATCH op (or on threaded builds, stored in PL_regex_pad[], indexed by the op_pmoffset field in the match op). For a runtime pattern like $foo =~ /abc$bar/; or $pat = qr/..../; $foo =~ $pat; the pattern is compiled by a regcomp op (or for a qr// object, duplicated), and the resulting REGEX SV is stored by the regcomp op into its related match (or subst or split etc) op. This regex will likely have a refcount of 1: i.e. it is only being kept alive by the link from the match op. Normally this is fine: the regex lives for as long as the op (and hence the sub it lives in) exist. In particular, the regex continues to live on after the match is complete, so that $1 etc will work. $1 etc work by perl setting PL_curpm to point to the match op which most recently did a successful match. This is dynamically scoped: on scope exit, the old value of PL_curpm is restored. When $1 is accessed, its get-magic is called, which looks up PL_curpm, gets the regex pointed to by that match op, and that regex contains the char ranges and match string associated with the most recent match. The Problem: That all works well until the` sub foo { $string =~ $pat; } from the example above is called recursively from the /(?{...}/). When foo() is first called, $pat is compiled and the resulting REGEXP is stored in the OP_MATCH with a ref count of 1. OP_MATCH is then executed, which calls the regex engine with that regex and string. Part of the match is a (?{...}) which recursively calls foo(). foo() does an OP_REGCOMP again, which overwrites the current regex in the OP_MATCH with a new regex, freeing the old regex (the one we are in the middle of executing). Cue SEGVs etc. There is a further complication: PL_curpm points to the current successful *match op* rather than the current *regex*. When the regex engine was made accessible via an API, it was possible for the engine to be running with no active OP_MATCH present. But the design of (?{...}) is such that any partial matches are accessible *during* the execution, not just after the end of a successful match. So for example "AB" =~ /^(.) (?{ say "$1$2" } (.)$/x; will print out "A" and undef. The regex engine handles this by having a fake global match PMOP structure, PL_reg_curpm, and every time code within (?{...}) is about to be called, the current regex is pointed to from PL_reg_curpm, and PL_curpm is set to point to PL_reg_curpm. Since this is global, it suffers from the same problem as for the recursive match op, in that the inner call to (?{...}) will overwrite the regex pointer in PL_reg_curpm, potentially prematurely freeing the regex, and even if not freed, meaning that on return to the outer pattern, $1 et al will refer to the inner match, not the current match. The Solution: This commit makes use of an existing save/restore mechanism for patterns involving (?{...}). At the start of the match, S_setup_eval_state() is called, which saves some state in reginfo->info_aux_eval. On exit from the match (either normally or via croak), S_cleanup_regmatch_info_aux() is called to restore stuff. This commit saves three new things in info_aux_eval. 1) The current REGEXP SV (ref counted). Formerly it stored ReANY(regex); now it stores regex directly. This ensures the regex isn't freed during matching (including calls out to code in (?{...}) blocks), but it doesn't guarantee that it will live on after the end of the match, to be accessible to $1 etc al. 2) It saves (ref counted) the current value of the regex pointer in PL_reg_curpm and restores it on return. Thus on return from doing the inner match, $1 et al will give the current value for any remaining code within the code block, e.g. /(?{ foo(); print $1 })/ 3) If PL_op happens to be a pattern match op (it might not if for example the engine has been called via the API from XS) then its regex is saved and restored similar to (2). The combination of those three extra saves makes it likely that the regex will not be prematurely freed, and $1 etc will have the right values at all times. Note that this commit doesn't fix the general problem of recursively calling a match op; only the ones involving calls from within a (?{...}). For example this still prints "BB" rather than "AB": sub foo { $_[0] =~ /(.)/; foo('B') if $_[0] eq 'A'; print $1; } foo('A'); Note that the PM_GETRE() and PM_SETRE() macros, which I wanted to use to save and restore the regex pointer in PL_reg_curpm, do some funny business: PM_GETRE() returns NULL if the SV isn't a REGEX (e.g. if its &PL_sv_undef),and PM_SETRE asserts that the regex isn't null. I got round those side-effects by adding PM_GETRE_raw()/PM_SETRE_raw(), which do nothing but get/set the regex from the PMOP.
1 parent b895bb2 commit 39b4841

File tree

4 files changed

+86
-5
lines changed

4 files changed

+86
-5
lines changed

op.h

+9
Original file line numberDiff line numberDiff line change
@@ -298,9 +298,16 @@ struct pmop {
298298
OP * op_code_list; /* list of (?{}) code blocks */
299299
};
300300

301+
/* The PM_GETRE_raw/PM_SETRE_raw variants get/set the slot without any
302+
* processing or asserts */
301303
#ifdef USE_ITHREADS
304+
#define PM_GETRE_raw(o) (REGEXP*)(PL_regex_pad[(o)->op_pmoffset])
302305
#define PM_GETRE(o) (SvTYPE(PL_regex_pad[(o)->op_pmoffset]) == SVt_REGEXP \
303306
? (REGEXP*)(PL_regex_pad[(o)->op_pmoffset]) : NULL)
307+
308+
#define PM_SETRE_raw(o,r) STMT_START { \
309+
PL_regex_pad[(o)->op_pmoffset] = MUTABLE_SV(r); \
310+
} STMT_END
304311
/* The assignment is just to enforce type safety (or at least get a warning).
305312
*/
306313
/* With first class regexps not via a reference one needs to assign
@@ -315,7 +322,9 @@ struct pmop {
315322
PL_regex_pad[(o)->op_pmoffset] = MUTABLE_SV(_pm_setre); \
316323
} STMT_END
317324
#else
325+
#define PM_GETRE_raw(o) ((o)->op_pmregexp)
318326
#define PM_GETRE(o) ((o)->op_pmregexp)
327+
#define PM_SETRE_raw(o,r) ((o)->op_pmregexp = (r))
319328
#define PM_SETRE(o,r) ((o)->op_pmregexp = (r))
320329
#endif
321330

regexec.c

+35-3
Original file line numberDiff line numberDiff line change
@@ -11237,8 +11237,9 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
1123711237
regexp *const rex = ReANY(reginfo->prog);
1123811238
regmatch_info_aux_eval *eval_state = reginfo->info_aux_eval;
1123911239

11240-
eval_state->rex = rex;
11241-
eval_state->sv = reginfo->sv;
11240+
eval_state->rx = reginfo->prog;
11241+
SvREFCNT_inc(eval_state->rx);
11242+
eval_state->sv = reginfo->sv;
1124211243

1124311244
if (reginfo->sv) {
1124411245
/* Make $_ available to executed code. */
@@ -11278,6 +11279,26 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
1127811279
}
1127911280
#endif
1128011281
}
11282+
11283+
/* if we're currently executing a MATCHish op, the only ref to the
11284+
* current regex might be from that op. If recursive code called from
11285+
* (?{...}) recompiles that regex, the old regex will be lost -
11286+
* meaning that $1 etc will stuff refer to the value from the inner
11287+
* match. So if possible restore the PMOPs regex to the outer value at
11288+
* the end of the outer match */
11289+
if ( PL_op
11290+
&& (PL_opargs[PL_op->op_type] & OA_CLASS_MASK) == OA_PMOP
11291+
&& PM_GETRE((PMOP*)PL_op))
11292+
{
11293+
eval_state->old_op = (PMOP*)PL_op;
11294+
eval_state->old_op_val = PM_GETRE((PMOP*)PL_op);
11295+
SvREFCNT_inc(eval_state->old_op_val);
11296+
}
11297+
else
11298+
eval_state->old_op = NULL;
11299+
11300+
eval_state->old_regcurpm_val = PM_GETRE_raw(PL_reg_curpm);
11301+
SvREFCNT_inc(eval_state->old_regcurpm_val);
1128111302
S_set_reg_curpm(aTHX_ reginfo->prog, reginfo);
1128211303
eval_state->curpm = PL_curpm;
1128311304
PL_curpm_under = PL_curpm;
@@ -11320,7 +11341,7 @@ S_cleanup_regmatch_info_aux(pTHX_ void *arg)
1132011341
/* undo the effects of S_setup_eval_state() */
1132111342

1132211343
if (eval_state->subbeg) {
11323-
regexp * const rex = eval_state->rex;
11344+
regexp * const rex = ReANY(eval_state->rx);
1132411345
RXp_SUBBEG(rex) = eval_state->subbeg;
1132511346
RXp_SUBLEN(rex) = eval_state->sublen;
1132611347
RXp_SUBOFFSET(rex) = eval_state->suboffset;
@@ -11340,6 +11361,17 @@ S_cleanup_regmatch_info_aux(pTHX_ void *arg)
1134011361

1134111362
PL_curpm = eval_state->curpm;
1134211363
SvREFCNT_dec(eval_state->sv);
11364+
SvREFCNT_dec(eval_state->rx);
11365+
11366+
REGEXP *old_rx = PM_GETRE(PL_reg_curpm);
11367+
PM_SETRE_raw(PL_reg_curpm, eval_state->old_regcurpm_val);
11368+
SvREFCNT_dec(old_rx);
11369+
11370+
if (eval_state->old_op) {
11371+
old_rx = PM_GETRE(eval_state->old_op);
11372+
PM_SETRE(eval_state->old_op, eval_state->old_op_val);
11373+
SvREFCNT_dec(old_rx);
11374+
}
1134311375
}
1134411376

1134511377
PL_regmatch_state = aux->old_regmatch_state;

regexp.h

+4-1
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,10 @@ struct regmatch_slab;
776776
* regmatch_state stack at the start of execution */
777777

778778
typedef struct {
779-
regexp *rex;
779+
REGEXP *rx;
780+
PMOP *old_op; /* saved value of PL_op and ... */
781+
REGEXP *old_op_val; /* ... saved value of PM_GETRE(PL_op) if any */
782+
REGEXP *old_regcurpm_val; /* saved value of PM_GETRE(PL_reg_curpm) */
780783
PMOP *curpm; /* saved PL_curpm */
781784
#ifdef PERL_ANY_COW
782785
SV *saved_copy; /* saved saved_copy field from rex */

t/re/pat_re_eval.t

+38-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ BEGIN {
2525
our @global;
2626

2727

28-
plan tests => 527; # Update this when adding/deleting tests.
28+
plan tests => 528; # Update this when adding/deleting tests.
2929

3030
run_tests() unless caller;
3131

@@ -1409,6 +1409,43 @@ sub run_tests {
14091409

14101410
ok("" =~ m{^ (?{eval q{$x=}})}x, "GH #19390");
14111411

1412+
# GH #22869 "Perl crash with recursive sub and regex with code eval".
1413+
#
1414+
# A recursive call to a match op with a run-time pattern and which
1415+
# contained a code block, led to to the temporary rex stored in the
1416+
# OP_MATCH and PL_reg_curpm ops getting prematurely freed when updated
1417+
# within the inner match's OP_MATCH op.
1418+
1419+
{
1420+
my @got;
1421+
1422+
my $f = sub {
1423+
my ($s, $re) = @_;
1424+
$s =~ $re;
1425+
push @got, ',', $1, $2, ']';
1426+
};
1427+
1428+
my $pat;
1429+
$pat = qr{^
1430+
(.)
1431+
(?{
1432+
push @got, '[', $1, $2;
1433+
$f->('XY', $pat) if $1 eq 'A';
1434+
push @got, ',', $1, $2;
1435+
})
1436+
(.)
1437+
(?{
1438+
push @got, ',', $1, $2;
1439+
})
1440+
$
1441+
}x;
1442+
1443+
$f->('AB',$pat);
1444+
1445+
my $got = join '', map defined ? $_ : '-', @got;
1446+
is($got, "[A-[X-,X-,XY,XY],A-,AB,AB]", "GH22869");
1447+
}
1448+
14121449
} # End of sub run_tests
14131450

14141451
1;

0 commit comments

Comments
 (0)