bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure #32283

animalize · 2022-04-03T08:23:29Z

This time, I checked several rounds carefully, it should be in good state.

Tested with VERBOSE/VVERBOSE macros defined, it builds and runs well.

https://bugs.python.org/issue23689

_CompileData can store intermediate data.

argument clinic

-_validate_inner(SRE_CODE *code, SRE_CODE *end, Py_ssize_t groups) +_validate_inner(SRE_CODE *code, SRE_CODE *end, PatternObject *self)

serhiy-storchaka

LGTM.

I have only one question: how to prove that we need only one SRE_REPEAT structure per the REPEAT code?

Modules/_sre.c

serhiy-storchaka · 2022-04-03T13:42:27Z

Lib/test/test_re.py

-        self.assertEqual(get_debug_out(r'(?:ab)*(?:cd)*'), '''\
-MAX_REPEAT 0 MAXREPEAT
+        self.assertEqual(get_debug_out(r'(?:ab)*?(?:cd)*'), '''\
+MIN_REPEAT 0 MAXREPEAT


You just read my mind! I was going to propose such a change, but I thought that I was already bothering you too much.

I just thought this after posting this PR.

I thought that I was already bothering you too much.

As an inactive contributor, this is not a matter.
I'm not practised, so need continuously improve the patch to get to a good state.
When I think it's good, I can always find its shortcomings afterwards.

I have only one question: how to prove that we need only one SRE_REPEAT structure per the REPEAT code?

I have to think about how to answer your question.

animalize · 2022-04-03T14:40:10Z

I have only one question: how to prove that we need only one SRE_REPEAT structure per the REPEAT code?

At any time, an SRE_OP_REPEAT only has one in the stack.

When executing this OP, it's pushed in the stack. Although SRE_OP_MAX_UNTIL / SRE_OP_MIN_UNTIL may generate many backtracking points in the stack (above SRE_OP_REPEAT in the stack).
When backtracking, it's poped from the stack.

This wouldn't work if re engine could memorize some backtracking states to optimize performance, but then the code of re module would be much more complicated.

animalize added 8 commits Apr 3, 2022

1. add variables

e5317fe

2. add _CompileData class to _compile.py

28e4d2d

_CompileData can store intermediate data.

3. add repeat_count parameter to _sre.compile() function

7493fad

argument clinic

4. emit repeat_count in sre_compile.py

b928335

5. change _validate_outer() parameter

dfec05d

-_validate_inner(SRE_CODE *code, SRE_CODE *end, Py_ssize_t groups) +_validate_inner(SRE_CODE *code, SRE_CODE *end, PatternObject *self)

6. validate in _validate_inner() / _validate_outer()

db7e88b

7. allocate repeats_array for SRE_STATE

a43cb6e

8. support code in sre_lib.h

1560a0c

bedevere-bot added the awaiting review label Apr 3, 2022

the-knights-who-say-ni added the CLA signed label Apr 3, 2022

9. add unit-tests

96be025

animalize force-pushed the repeat_array2 branch from e34018e to 96be025 Compare Apr 3, 2022

serhiy-storchaka approved these changes Apr 3, 2022

View changes

Modules/_sre.c Outdated Show resolved Hide resolved

bedevere-bot added awaiting merge and removed awaiting review labels Apr 3, 2022

animalize added 2 commits Apr 3, 2022

a. skip -= field_number

7ee3e66

b. improve unit-test

e95b19f

serhiy-storchaka reviewed Apr 3, 2022

View changes

serhiy-storchaka merged commit 6e3eee5 into python:main Apr 3, 2022
12 checks passed

bedevere-bot removed the awaiting merge label Apr 3, 2022

animalize deleted the repeat_array2 branch Apr 4, 2022

python / cpython Public

bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure #32283

bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure #32283

animalize commented Apr 3, 2022 •

edited by bedevere-bot

serhiy-storchaka left a comment

serhiy-storchaka Apr 3, 2022

animalize Apr 3, 2022 •

edited

animalize commented Apr 3, 2022 •

edited

python / cpython Public

bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure #32283

bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure #32283

Conversation

animalize commented Apr 3, 2022 • edited by bedevere-bot

serhiy-storchaka left a comment

serhiy-storchaka Apr 3, 2022

Choose a reason for hiding this comment

animalize Apr 3, 2022 • edited

Choose a reason for hiding this comment

animalize commented Apr 3, 2022 • edited

animalize commented Apr 3, 2022 •

edited by bedevere-bot

animalize Apr 3, 2022 •

edited

animalize commented Apr 3, 2022 •

edited