New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-97933: inline list/dict/set comprehensions in functions #101441
base: main
Are you sure you want to change the base?
Conversation
@markshannon @iritkatriel would be grateful for a pyperformance run on this one (presuming signal all looks good.) |
I'll run the benchmarks now |
Benchmarking is delayed a bit, but should be back tomorrow at the latest. |
You don't seem to be clearing the inner variable before use, so that def f():
l = [ None ]
[ (l[0], l) in [[1,2]] ]
print(l) will, I think, print You can actually make the code more efficient by clearing the local when stashing it on the stack: inst(LOAD_FAST_AND_CLEAR, ( -- val)) {
val = LOCALS[oparg];
LOCALS[oparg] = NULL;
} as there is no need for a |
Python/compile.c
Outdated
// could be cleverer about minimizing swaps but it doesn't matter | ||
// because `apply_static_swaps` will eliminate all of these anyway | ||
ADDOP_I(c, loc, SWAP, 2); | ||
ADDOP_NAME(c, loc, STORE_FAST, k, varnames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be storing NULL
here. That is OK for the interpreter, but the dataflow analysis in add_checks_for_loads_of_uninitialized_variables
in the compiler assumes that STORE_FAST
leaves the variable initialized.
What you'll need to do is add a pseudo instruction, and convert that to STORE_FAST
in the assembler. That way the dataflow analysis won't be fooled.
You might want to update add_checks_for_loads_of_uninitialized_variables
to understand the LOAD_FAST_AND_CLEAR
, STORE_FAST_MAYBE_NULL
pairs, but that should wait for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a test case:
def f():
if False:
x = 0
[x for _ in [1]]
return x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, thanks!
The test case needs to use [x for x in [1]]
, not [x for _ in [1]]
-- in the latter case, x
is not bound in the comprehension and is just a free var (converted to local by inlining) so we don't push/pop it at all.
A couple of obscure issues, but this looks like a very nice improvement. I think the gain in performance is well worth the cost in code size, given how common comprehensions are. |
* main: pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439) pythongh-99276 - Updated Doc/faq/general.rst (python#101396) Add JOBS parameter to docs Makefile (python#101395) pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443) pythongh-77607: Improve accuracy of os.path.join docs (python#101406) Fixes typo in asyncio.TaskGroup context manager code example (python#101449) pythongh-98831: Clean up and add cache size static_assert to macro (python#101442) pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
Your example code doesn't involve a comprehension, I'm assuming you meant something like
But this doesn't raise So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!
But I think this is sufficient reason regardless to justify switching from |
Actually I don't think
We need to save/restore the outer value of If there were a correctness reason for needing to clear in other cases, I could add an additional EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though... |
@markshannon I think |
I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:
|
* main: pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354) pythongh-101409: Improve generated clinic code for self type checks (python#101411) pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458) pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
Sorry for posting a gibberish example. def f():
l = [ None ]
[ 1 for (l[0], l) in [[1,2]] ]
print(l)
f() 3.11
This PR
|
def f(x):
return [x for x in x] This shouldn't be a problem. Let's rename for clarity: def f(x0):
return [x1 for x1 in x0] The expression This PR stashes the variable
Note that the evaluation of I.e. def f(x):
return [x for x in x] should be compiled as def f(x):
$tmp = x
return [x for x in $tmp] (I think I'd prefer |
Aha, thanks for the example!
Yes, I considered above the option of moving the evaluation of the iterable expression to before the stack pushes. I'll revisit that, since we do have to clear locals to maintain semantics. We'll have to either stash it in a local or use SWAP to keep it TOS. |
* main: pythongh-98831: rewrite PUSH_EXC_INFO and conditional jumps in the instruction definition DSL (python#101481) pythongh-98831: Modernize the LOAD_ATTR family (python#101488) pythongh-101498 : Fix asyncio.Timeout example in docs (python#101499) pythongh-101454: fix documentation for END_ASYNC_FOR (python#101455) pythongh-101277: Isolate itertools, add group and _grouper types to module state (python#101302) pythongh-101317: Add `ssl_shutdown_timeout` parameter for `asyncio.StreamWriter.start_tls` (python#101335) datetime.rst: fix combine() signature (python#101490)
* main: (82 commits) pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723) pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736) no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731) pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730) pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679) pythongh-85984: Remove legacy Lib/pty.py code. (python#92365) pythongh-98831: Use opcode metadata for stack_effect() (python#101704) pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719) pythongh-101283: Fix use of unbound variable (pythonGH-101712) pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) pythongh-101277: Port more itertools static types to heap types (python#101304) pythongh-98831: Modernize CALL and family (python#101508) pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697) pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329) pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672) pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324) pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615) pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934) pythonGH-101578: Normalize the current exception (pythonGH-101607) pythongh-47937: Note that Popen attributes are read-only (python#93070) ...
* main: Fix some typos in asdl_c.py (pythonGH-101757) pythongh-101747: Fix refleak in new `OrderedDict` repr (pythonGH-101748) pythongh-101430: Update tracemalloc to handle presize properly. (pythongh-101745) pythonGH-101228: Fix typo in docstring for read method of `_io.TextIOWrapper` class (python#101227) Fix typo in `test_fstring.py` (python#101600) pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727) pythongh-101283: Fix 'versionchanged' for the shell=True fallback on Windows in 3.12 (pythonGH-101728) LibFFI build requires x64 Cygwin, and skip the ARM build (pythonGH-101743)
@markshannon Ok, I think I've addressed all review comments, and CI looks good. Ready for review and pyperformance run. |
Given the current usage of STORE_FAST_MAYBE_NULL, it isn't possible to construct a repro where this is needed, because if the local was NULL coming into the comprehension it will already be marked as unsafe coming out of the comprehension (since the body of the comprehension is always a loop which may be skipped entirely). But in case this opcode is used in other contexts in future, it should mark the local unsafe.
* main: Docs: Fix getstatus() -> getcode() typos (python#101296) Docs: use parameter list for sqlite3.Cursor.execute* (python#101782) pythongh-101763: Update bundled copy of libffi to 3.4.4 on Windows (pythonGH-101784) pythongh-101517: make bdb avoid looking up in linecache with lineno=None (python#101787) pythongh-101759: Update Windows installer to SQLite 3.40.1 (python#101762) pythongh-101277: Finalise isolating itertools (pythonGH-101305) pythongh-101759: Update macOS installer to SQLite 3.40.1 (python#101761)
* main: pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811) pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873) pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988) pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788) Correct trivial grammar in reset_mock docs (python#101861) pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846) pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841) pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831) pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837) Fix typo in test_fstring.py (python#101823) pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798) pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)
* main: (60 commits) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161) pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074) pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912) pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949) pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070) pythongh-97786: Fix compiler warnings in pytime.c (python#101826) pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962) Misc improvements to the float tutorial (pythonGH-102052) pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046) pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854) Add missing 'is' to `cmath.log()` docstring (python#102049) pythongh-100210: Correct the comment link for unescaping HTML (python#100212) pythongh-97930: Also include subdirectory in makefile. (python#102030) pythongh-99735: Use required=True in argparse subparsers example (python#100927) Fix incorrectly documented attribute in csv docs (python#101250) pythonGH-84783: Make the slice object hashable (pythonGH-101264) ...
Closes #97933.
In builds with
--enable-optimizations
:Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.
A couple implementation notes:
LOAD_FAST_AND_CLEAR
because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert thatLOAD_FAST
never loadsNULL
, and would add inefficient refcount operations and interpreter loop cycles.co_varnames
andco_cellvars
for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just useLOAD_FAST_AND_CLEAR/STORE_FAST
as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.