Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-bounds read in unicodeobject.c ascii_decode() #103656

Closed
guidovranken opened this issue Apr 20, 2023 · 5 comments · Fixed by #103896
Closed

Out-of-bounds read in unicodeobject.c ascii_decode() #103656

guidovranken opened this issue Apr 20, 2023 · 5 comments · Fixed by #103896
Assignees
Labels
type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue

Comments

@guidovranken
Copy link

guidovranken commented Apr 20, 2023

Crash report

The following will crash the interpreter if compiled with AddressSanitizer:

import ast
ast.parse(bytes([0x66, 0x27, 0x7b, 0x5f, 0x3d, 0x7d, 0x7b, 0x3b]))

This was found by OSS-Fuzz today. I reported it to the security address, who said it's fine to report to GH.

OSS-Fuzz reports this commit range as having introduced the bug: ece20db...6be7aee

Error messages

AddressSanitizer stack trace:

==715==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020001fda12 at pc 0x0000006f1d78 bp 0x7ffc77957450 sp 0x7ffc77957448
  | READ of size 1 at 0x6020001fda12 thread T0
  | SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
  | #0 0x6f1d77 in ascii_decode cpython/Objects/unicodeobject.c:4490:32
  | #1 0x6f1d77 in unicode_decode_utf8 cpython/Objects/unicodeobject.c:4548:10
  | #2 0x6f8d37 in PyUnicode_DecodeUTF8Stateful cpython/Objects/unicodeobject.c:4677:12
  | #3 0x6f8d37 in PyUnicode_DecodeUTF8 cpython/Objects/unicodeobject.c:4438:12
  | #4 0xca94cb in decode_fstring_buffer cpython/Parser/action_helpers.c:1208:21
  | #5 0xca94cb in _PyPegen_formatted_value cpython/Parser/action_helpers.c:1435:30
  | #6 0xb1614e in fstring_replacement_field_rule cpython/Parser/parser.c:15671:20
  | #7 0xac0efa in fstring_middle_rule cpython/Parser/parser.c:15569:46
  | #8 0xac0efa in _loop0_3_rule cpython/Parser/parser.c:24878:35
  | #9 0xac0efa in fstring_rule cpython/Parser/parser.c:1334:18
  | #10 0xafac43 in _tmp_253_rule cpython/Parser/parser.c:40299:28
  | #11 0xafac43 in _loop1_113_rule cpython/Parser/parser.c:31877:29
  | #12 0xafac43 in strings_rule cpython/Parser/parser.c:15962:34
  | #13 0xadce01 in atom_rule cpython/Parser/parser.c:14388:28
  | #14 0xae53cc in t_primary_raw cpython/Parser/parser.c:18640:18
  | #15 0xae53cc in t_primary_rule cpython/Parser/parser.c:18429:22
  | #16 0xb60c7e in single_subscript_attribute_target_rule cpython/Parser/parser.c:18319:18
  | #17 0xb5cdb9 in _tmp_13_rule cpython/Parser/parser.c:25515:54
  | #18 0xb53fbf in assignment_rule cpython/Parser/parser.c:2323:18
  | #19 0xb53fbf in simple_stmt_rule cpython/Parser/parser.c:1730:31
  | #20 0xac39e8 in simple_stmts_rule cpython/Parser/parser.c:1625:18
  | #21 0xac1c9a in statement_rule cpython/Parser/parser.c:1448:34
  | #22 0xac1c9a in _loop1_4_rule cpython/Parser/parser.c:24946:30
  | #23 0xac1c9a in statements_rule cpython/Parser/parser.c:1380:18
  | #24 0xabcdf4 in file_rule cpython/Parser/parser.c:1128:18
  | #25 0xabcdf4 in _PyPegen_parse cpython/Parser/parser.c:41236:18
  | #26 0xab8be1 in _PyPegen_run_parser cpython/Parser/pegen.c:840:9
  | #27 0xab948f in _PyPegen_run_parser_from_string cpython/Parser/pegen.c:938:14
  | #28 0xb6a257 in _PyParser_ASTFromString cpython/Parser/peg_api.c:14:21
  | #29 0x8fc949 in Py_CompileStringObject cpython/Python/pythonrun.c:1771:11
  | #30 0x799e73 in builtin_compile_impl cpython/Python/bltinmodule.c:831:14
  | #31 0x799e73 in builtin_compile cpython/Python/clinic/bltinmodule.c.h:383:20
  | #32 0xc09ba1 in cfunction_vectorcall_FASTCALL_KEYWORDS cpython/Objects/methodobject.c:438:24
  | #33 0x5b7659 in _PyObject_VectorcallTstate cpython/Include/internal/pycore_call.h:92:11
  | #34 0x5b7659 in PyObject_Vectorcall cpython/Objects/call.c:301:12
  | #35 0x7bb81e in _PyEval_EvalFrameDefault cpython/Python/bytecodes.c:2533:19
  | #36 0x7a264f in _PyEval_EvalFrame cpython/Include/internal/pycore_ceval.h:88:16
  | #37 0x7a264f in _PyEval_Vector cpython/Python/ceval.c:1522:12
  | #38 0x5b804f in _PyFunction_Vectorcall cpython/Objects/call.c:0
  | #39 0x5b7462 in _PyVectorcall_Call cpython/Objects/call.c:247:16
  | #40 0x5b79c2 in _PyObject_Call cpython/Objects/call.c:330:16
  | #41 0x5b8424 in PyObject_CallObject cpython/Objects/call.c:454:12
  | #42 0x593b33 in LLVMFuzzerTestOneInput python-library-fuzzers/fuzzer.cpp:134:14
  | #43 0x461943 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
  | #44 0x44d0a2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
  | #45 0x45294c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
  | #46 0x47be82 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
  | #47 0x72f891c63082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16
  | #48 0x44326d in _start
  |  
  | 0x6020001fda12 is located 0 bytes to the right of 2-byte region [0x6020001fda10,0x6020001fda12)
  | allocated by thread T0 here:
  | #0 0x552ad6 in __interceptor_malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
  | #1 0x66e839 in _PyMem_RawMalloc cpython/Objects/obmalloc.c:42:12
  | #2 0x67188a in PyMem_Malloc cpython/Objects/obmalloc.c:587:12
  | #3 0xb872cd in update_fstring_expr cpython/Parser/tokenizer.c:412:42
  | #4 0xb7996c in tok_get_normal_mode cpython/Parser/tokenizer.c:2340:29
  | #5 0xb6fe0e in tok_get cpython/Parser/tokenizer.c:0
  | #6 0xb6fe0e in _PyTokenizer_Get cpython/Parser/tokenizer.c:2639:18
  | #7 0xab3a5f in _PyPegen_fill_token cpython/Parser/pegen.c:201:16
  | #8 0xb15c2d in fstring_replacement_field_rule cpython/Parser/parser.c:15626:31
  | #9 0xac0efa in fstring_middle_rule cpython/Parser/parser.c:15569:46
  | #10 0xac0efa in _loop0_3_rule cpython/Parser/parser.c:24878:35
  | #11 0xac0efa in fstring_rule cpython/Parser/parser.c:1334:18
  | #12 0xafac43 in _tmp_253_rule cpython/Parser/parser.c:40299:28
  | #13 0xafac43 in _loop1_113_rule cpython/Parser/parser.c:31877:29
  | #14 0xafac43 in strings_rule cpython/Parser/parser.c:15962:34
  | #15 0xadce01 in atom_rule cpython/Parser/parser.c:14388:28
  | #16 0xae53cc in t_primary_raw cpython/Parser/parser.c:18640:18
  | #17 0xae53cc in t_primary_rule cpython/Parser/parser.c:18429:22
  | #18 0xb60c7e in single_subscript_attribute_target_rule cpython/Parser/parser.c:18319:18
  | #19 0xb5cdb9 in _tmp_13_rule cpython/Parser/parser.c:25515:54
  | #20 0xb53fbf in assignment_rule cpython/Parser/parser.c:2323:18
  | #21 0xb53fbf in simple_stmt_rule cpython/Parser/parser.c:1730:31
  | #22 0xac39e8 in simple_stmts_rule cpython/Parser/parser.c:1625:18
  | #23 0xac1c9a in statement_rule cpython/Parser/parser.c:1448:34
  | #24 0xac1c9a in _loop1_4_rule cpython/Parser/parser.c:24946:30
  | #25 0xac1c9a in statements_rule cpython/Parser/parser.c:1380:18
  | #26 0xabcdf4 in file_rule cpython/Parser/parser.c:1128:18
  | #27 0xabcdf4 in _PyPegen_parse cpython/Parser/parser.c:41236:18
  | #28 0xab86f2 in _PyPegen_run_parser cpython/Parser/pegen.c:825:17
  | #29 0xab948f in _PyPegen_run_parser_from_string cpython/Parser/pegen.c:938:14
  | #30 0xb6a257 in _PyParser_ASTFromString cpython/Parser/peg_api.c:14:21
  | #31 0x8fc949 in Py_CompileStringObject cpython/Python/pythonrun.c:1771:11
  | #32 0x799e73 in builtin_compile_impl cpython/Python/bltinmodule.c:831:14
  | #33 0x799e73 in builtin_compile cpython/Python/clinic/bltinmodule.c.h:383:20
  | #34 0xc09ba1 in cfunction_vectorcall_FASTCALL_KEYWORDS cpython/Objects/methodobject.c:438:24
  | #35 0x5b7659 in _PyObject_VectorcallTstate cpython/Include/internal/pycore_call.h:92:11
  | #36 0x5b7659 in PyObject_Vectorcall cpython/Objects/call.c:301:12
  | #37 0x7bb81e in _PyEval_EvalFrameDefault cpython/Python/bytecodes.c:2533:19
  | #38 0x7a264f in _PyEval_EvalFrame cpython/Include/internal/pycore_ceval.h:88:16
  | #39 0x7a264f in _PyEval_Vector cpython/Python/ceval.c:1522:12
  | #40 0x5b804f in _PyFunction_Vectorcall cpython/Objects/call.c:0
  | #41 0x5b7462 in _PyVectorcall_Call cpython/Objects/call.c:247:16
  | #42 0x5b79c2 in _PyObject_Call cpython/Objects/call.c:330:16
 

<br class="Apple-interchange-newline">==715==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020001fda12 at pc 0x0000006f1d78 bp 0x7ffc77957450 sp 0x7ffc77957448
READ of size 1 at 0x6020001fda12 thread T0
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
    #0 0x6f1d77 in ascii_decode [cpython/Objects/unicodeobject.c:4490](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/unicodeobject.c#L4490):32
    #1 0x6f1d77 in unicode_decode_utf8 [cpython/Objects/unicodeobject.c:4548](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/unicodeobject.c#L4548):10
    #2 0x6f8d37 in PyUnicode_DecodeUTF8Stateful [cpython/Objects/unicodeobject.c:4677](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/unicodeobject.c#L4677):12
    #3 0x6f8d37 in PyUnicode_DecodeUTF8 [cpython/Objects/unicodeobject.c:4438](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/unicodeobject.c#L4438):12
    #4 0xca94cb in decode_fstring_buffer [cpython/Parser/action_helpers.c:1208](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/action_helpers.c#L1208):21
    #5 0xca94cb in _PyPegen_formatted_value [cpython/Parser/action_helpers.c:1435](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/action_helpers.c#L1435):30
    #6 0xb1614e in fstring_replacement_field_rule [cpython/Parser/parser.c:15671](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15671):20
    #7 0xac0efa in fstring_middle_rule [cpython/Parser/parser.c:15569](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15569):46
    #8 0xac0efa in _loop0_3_rule [cpython/Parser/parser.c:24878](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L24878):35
    #9 0xac0efa in fstring_rule [cpython/Parser/parser.c:1334](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1334):18
    #10 0xafac43 in _tmp_253_rule [cpython/Parser/parser.c:40299](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L40299):28
    #11 0xafac43 in _loop1_113_rule [cpython/Parser/parser.c:31877](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L31877):29
    #12 0xafac43 in strings_rule [cpython/Parser/parser.c:15962](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15962):34
    #13 0xadce01 in atom_rule [cpython/Parser/parser.c:14388](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L14388):28
    #14 0xae53cc in t_primary_raw [cpython/Parser/parser.c:18640](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18640):18
    #15 0xae53cc in t_primary_rule [cpython/Parser/parser.c:18429](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18429):22
    #16 0xb60c7e in single_subscript_attribute_target_rule [cpython/Parser/parser.c:18319](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18319):18
    #17 0xb5cdb9 in _tmp_13_rule [cpython/Parser/parser.c:25515](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L25515):54
    #18 0xb53fbf in assignment_rule [cpython/Parser/parser.c:2323](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L2323):18
    #19 0xb53fbf in simple_stmt_rule [cpython/Parser/parser.c:1730](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1730):31
    #20 0xac39e8 in simple_stmts_rule [cpython/Parser/parser.c:1625](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1625):18
    #21 0xac1c9a in statement_rule [cpython/Parser/parser.c:1448](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1448):34
    #22 0xac1c9a in _loop1_4_rule [cpython/Parser/parser.c:24946](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L24946):30
    #23 0xac1c9a in statements_rule [cpython/Parser/parser.c:1380](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1380):18
    #24 0xabcdf4 in file_rule [cpython/Parser/parser.c:1128](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1128):18
    #25 0xabcdf4 in _PyPegen_parse [cpython/Parser/parser.c:41236](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L41236):18
    #26 0xab8be1 in _PyPegen_run_parser [cpython/Parser/pegen.c:840](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/pegen.c#L840):9
    #27 0xab948f in _PyPegen_run_parser_from_string [cpython/Parser/pegen.c:938](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/pegen.c#L938):14
    #28 0xb6a257 in _PyParser_ASTFromString [cpython/Parser/peg_api.c:14](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/peg_api.c#L14):21
    #29 0x8fc949 in Py_CompileStringObject [cpython/Python/pythonrun.c:1771](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/pythonrun.c#L1771):11
    #30 0x799e73 in builtin_compile_impl [cpython/Python/bltinmodule.c:831](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/bltinmodule.c#L831):14
    #31 0x799e73 in builtin_compile [cpython/Python/clinic/bltinmodule.c.h:383](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/clinic/bltinmodule.c.h#L383):20
    #32 0xc09ba1 in cfunction_vectorcall_FASTCALL_KEYWORDS [cpython/Objects/methodobject.c:438](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/methodobject.c#L438):24
    #33 0x5b7659 in _PyObject_VectorcallTstate [cpython/Include/internal/pycore_call.h:92](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Include/internal/pycore_call.h#L92):11
    #34 0x5b7659 in PyObject_Vectorcall [cpython/Objects/call.c:301](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L301):12
    #35 0x7bb81e in _PyEval_EvalFrameDefault [cpython/Python/bytecodes.c:2533](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/bytecodes.c#L2533):19
    #36 0x7a264f in _PyEval_EvalFrame [cpython/Include/internal/pycore_ceval.h:88](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Include/internal/pycore_ceval.h#L88):16
    #37 0x7a264f in _PyEval_Vector [cpython/Python/ceval.c:1522](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/ceval.c#L1522):12
    #38 0x5b804f in _PyFunction_Vectorcall [cpython/Objects/call.c:0](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L0)
    #39 0x5b7462 in _PyVectorcall_Call [cpython/Objects/call.c:247](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L247):16
    #40 0x5b79c2 in _PyObject_Call [cpython/Objects/call.c:330](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L330):16
    #41 0x5b8424 in PyObject_CallObject [cpython/Objects/call.c:454](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L454):12
    #42 0x593b33 in LLVMFuzzerTestOneInput [python-library-fuzzers/fuzzer.cpp:134](https://github.com/guidovranken/python-library-fuzzers/blob/db092fa544205117fe8b41709e3b098d18679738/fuzzer.cpp#L134):14
    #43 0x461943 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #44 0x44d0a2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #45 0x45294c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #46 0x47be82 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #47 0x72f891c63082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16
    #48 0x44326d in _start
0x6020001fda12 is located 0 bytes to the right of 2-byte region [0x6020001fda10,0x6020001fda12)
allocated by thread T0 here:
    #0 0x552ad6 in __interceptor_malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x66e839 in _PyMem_RawMalloc [cpython/Objects/obmalloc.c:42](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/obmalloc.c#L42):12
    #2 0x67188a in PyMem_Malloc [cpython/Objects/obmalloc.c:587](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/obmalloc.c#L587):12
    #3 0xb872cd in update_fstring_expr [cpython/Parser/tokenizer.c:412](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/tokenizer.c#L412):42
    #4 0xb7996c in tok_get_normal_mode [cpython/Parser/tokenizer.c:2340](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/tokenizer.c#L2340):29
    #5 0xb6fe0e in tok_get [cpython/Parser/tokenizer.c:0](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/tokenizer.c#L0)
    #6 0xb6fe0e in _PyTokenizer_Get [cpython/Parser/tokenizer.c:2639](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/tokenizer.c#L2639):18
    #7 0xab3a5f in _PyPegen_fill_token [cpython/Parser/pegen.c:201](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/pegen.c#L201):16
    #8 0xb15c2d in fstring_replacement_field_rule [cpython/Parser/parser.c:15626](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15626):31
    #9 0xac0efa in fstring_middle_rule [cpython/Parser/parser.c:15569](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15569):46
    #10 0xac0efa in _loop0_3_rule [cpython/Parser/parser.c:24878](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L24878):35
    #11 0xac0efa in fstring_rule [cpython/Parser/parser.c:1334](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1334):18
    #12 0xafac43 in _tmp_253_rule [cpython/Parser/parser.c:40299](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L40299):28
    #13 0xafac43 in _loop1_113_rule [cpython/Parser/parser.c:31877](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L31877):29
    #14 0xafac43 in strings_rule [cpython/Parser/parser.c:15962](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L15962):34
    #15 0xadce01 in atom_rule [cpython/Parser/parser.c:14388](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L14388):28
    #16 0xae53cc in t_primary_raw [cpython/Parser/parser.c:18640](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18640):18
    #17 0xae53cc in t_primary_rule [cpython/Parser/parser.c:18429](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18429):22
    #18 0xb60c7e in single_subscript_attribute_target_rule [cpython/Parser/parser.c:18319](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L18319):18
    #19 0xb5cdb9 in _tmp_13_rule [cpython/Parser/parser.c:25515](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L25515):54
    #20 0xb53fbf in assignment_rule [cpython/Parser/parser.c:2323](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L2323):18
    #21 0xb53fbf in simple_stmt_rule [cpython/Parser/parser.c:1730](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1730):31
    #22 0xac39e8 in simple_stmts_rule [cpython/Parser/parser.c:1625](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1625):18
    #23 0xac1c9a in statement_rule [cpython/Parser/parser.c:1448](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1448):34
    #24 0xac1c9a in _loop1_4_rule [cpython/Parser/parser.c:24946](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L24946):30
    #25 0xac1c9a in statements_rule [cpython/Parser/parser.c:1380](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1380):18
    #26 0xabcdf4 in file_rule [cpython/Parser/parser.c:1128](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L1128):18
    #27 0xabcdf4 in _PyPegen_parse [cpython/Parser/parser.c:41236](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/parser.c#L41236):18
    #28 0xab86f2 in _PyPegen_run_parser [cpython/Parser/pegen.c:825](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/pegen.c#L825):17
    #29 0xab948f in _PyPegen_run_parser_from_string [cpython/Parser/pegen.c:938](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/pegen.c#L938):14
    #30 0xb6a257 in _PyParser_ASTFromString [cpython/Parser/peg_api.c:14](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Parser/peg_api.c#L14):21
    #31 0x8fc949 in Py_CompileStringObject [cpython/Python/pythonrun.c:1771](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/pythonrun.c#L1771):11
    #32 0x799e73 in builtin_compile_impl [cpython/Python/bltinmodule.c:831](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/bltinmodule.c#L831):14
    #33 0x799e73 in builtin_compile [cpython/Python/clinic/bltinmodule.c.h:383](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/clinic/bltinmodule.c.h#L383):20
    #34 0xc09ba1 in cfunction_vectorcall_FASTCALL_KEYWORDS [cpython/Objects/methodobject.c:438](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/methodobject.c#L438):24
    #35 0x5b7659 in _PyObject_VectorcallTstate [cpython/Include/internal/pycore_call.h:92](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Include/internal/pycore_call.h#L92):11
    #36 0x5b7659 in PyObject_Vectorcall [cpython/Objects/call.c:301](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L301):12
    #37 0x7bb81e in _PyEval_EvalFrameDefault [cpython/Python/bytecodes.c:2533](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/bytecodes.c#L2533):19
    #38 0x7a264f in _PyEval_EvalFrame [cpython/Include/internal/pycore_ceval.h:88](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Include/internal/pycore_ceval.h#L88):16
    #39 0x7a264f in _PyEval_Vector [cpython/Python/ceval.c:1522](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Python/ceval.c#L1522):12
    #40 0x5b804f in _PyFunction_Vectorcall [cpython/Objects/call.c:0](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L0)
    #41 0x5b7462 in _PyVectorcall_Call [cpython/Objects/call.c:247](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L247):16
    #42 0x5b79c2 in _PyObject_Call [cpython/Objects/call.c:330](https://github.com/python/cpython/blob/6be7aee18c5b8e639103df951d0d277f4b46f902/Objects/call.c#L330):16

Your environment

Linux x64, latest cpython main branch checkout.

Linked PRs

@guidovranken guidovranken added the type-crash A hard crash of the interpreter, possibly with a core dump label Apr 20, 2023
@alex alex added the type-security A security issue label Apr 20, 2023
@arhadthedev
Copy link
Member

For reference, bytes([0x66, 0x27, 0x7b, 0x5f, 0x3d, 0x7d, 0x7b, 0x3b]) is f'{_=}{; in UTF-8.

@sobolevn
Copy link
Member

I am pretty sure that this is related to 1ef61cf

@sunmy2019
Copy link
Contributor

The direct cause of read buffer overflow is that tok_mode->last_expr_end was never set to a positive value.

The current tokenizer will only do this thing when meeting '}' '!' ':'. Only 'EOF' occurs here.

cpython/Parser/tokenizer.c

Lines 420 to 425 in a4967d9

case '}':
case '!':
case ':':
if (tok_mode->last_expr_end == -1) {
tok_mode->last_expr_end = strlen(tok->start);
}

I am investigating why other protection fails.

CC. @pablogsal @lysnikolaou

@sunmy2019
Copy link
Contributor

sunmy2019 commented Apr 21, 2023

It is an assert failure in debug mode and an overflow read access in release mode.


It happens in the second round when the parser attempts with invalid_expression_rule.

When trying to match an invalid_expression of 1=}{;, it asks the tokenizer for the token {. Since then, tok_mode->last_expr_end is no longer the value for the current replacement field (but for the next replacement field).
But the invalid_expression_rule will not ask for another token. It stops at ;, leaving this -1 unchanged.

It usually will not cause a user-visible problem since it will raise a SyntaxError anyway (still wrong if it gets passed somehow).


In other cases, either there is a } being peeked by invalid_expression_rule, then it will happily use the value of the following replacement field. Or the error is found and caught by the tokenizer. A syntax error will be raised forehead.


Everything will work fine if the value is used for the correct replacement field, even with -1 inside.

In this special case, -1 for the next replacement field was misused as the value of the current replacement field.

many more failure examples:
as long as a valid debug string at the front {1=}, then {, then ;, and the error is not caught by the tokenizer.

f'{1=}{;'
f'{1=}{1;'
f'{1=}{1;}'
f'{1=}{+;'
f'{1=}{2}{;'
f'{1=}{3}{;'

The most theoretically correct way is to let each replacement field use its own value. But this is indeed technically hard.

A hack is possible but may open holes for further bugs.

Another option is that, we skip generating debug_text when the parser is invoking with call_invalid_rules.

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 26, 2023
pablogsal added a commit to lysnikolaou/cpython that referenced this issue Apr 27, 2023
lysnikolaou added a commit that referenced this issue Apr 27, 2023
@gpshead
Copy link
Member

gpshead commented Apr 27, 2023

thanks all for figuring this out! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants