New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe performance degradation for tracing under 3.11 #93516
Comments
Related issue where a user reported that code with cProfile slowed down 1.6x, and the only thing I could pinpoint was that tracing itself slowed down, not cProfile #93381. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I now realize this is probably due to using the Python tracer rather than the C tracer, will try again. |
Okay, now with Script used:import coverage
from time import perf_counter
def fib(n):
if n <= 1:
return n
else:
return fib(n-1) + fib(n-2)
t0 = perf_counter()
cov = coverage.Coverage()
cov.start()
fib(35)
cov.stop()
cov.save()
t1 = perf_counter()
print(t1 - t0) Profile on 3.10 branch: took 13.3 secondsFunctions taking more than 1% of CPU:
Profile on main branch: took 23.0 secondsFunctions taking more than 1% of CPU:
|
This seems to suggest to me that this could be made much faster via a dedicated C API function: https://github.com/nedbat/coveragepy/blob/master/coverage/ctracer/util.h#L41 #define MyCode_GetCode(co) (PyObject_GetAttrString((PyObject *)(co), "co_code")) |
Indeed, on main, if I add
Each of these calls to |
Ah! |
Some performance degradation under tracing is expected, but not as much as reported. From the profiles that @sweeneyde gathered it looks like we are seeing slowdowns in:
The new line number table gives better error messages without taking up too much extra space, but it is more complex and thus slower to parse. Once nedbat/coveragepy#1394 is merged, we can look at the profile again to see if much has changed. |
New profile of fib_cover.py on main branch, with the PyCode_GetCode addition (took 18.1 seconds)
I wonder: would there be any way to retain some specializations during tracing? Some specialized opcodes are certainly incorrect to use during tracing, e.g., |
Well, two times slower than 3.10, as some of @nedbat's benchmarks show, is not acceptable in my opinion. We should try to get this below 20% at the very least. I marked this as a release blocker and this will block any future releases. |
The problem with saying that something is unacceptable, is that implies there is an acceptable alternative. What does it mean making this a release blocker? Are you not going to release 3.11 if coverage is >20% slower? I don't think cutoffs, such as 20%, are constructive. We should do as well as we are reasonably able, regardless of whether we achieve some arbitrary number. |
I don't know, but there is always the possibility to revert the changes that made coverage slower (worst case scenario).
I will check with the SC, but I can already tell you that I would not be comfortable releasing 3.11 if is 2 times slower in coverage, unless the SC mandates otherwise.
The problem here is that this tradeoff was not discussed anywhere or advertised and no one had the possibility to pronounce or to make tradeoffs or otherwise, and therefore we cannot just "decide" out of the blue what is and what is not acceptable because the community didn't had the change to object. |
The largest contributor to the slowdown is computing line numbers. The root cause of that is PEP 657. The proximate cause is my PR to reduce the memory consumed of PEP 657. I think it would be a lot more helpful if we fixed the issue, rather than making dire pronouncements.
Everything we do is public. I even opened nedbat/coveragepy#1287 back in November, so that @nedbat was aware that there might be slowdowns to coverage.py |
That's not true and I feel is misleading. When line numbers and column offsets were separated, PEP 657 had 0 overhead on computing line numbers, so the root cause is certainly not PEP 657. The root cause was merging line numbers and column offsets so you need to pay for decoding both just to get line numbers.
That's not enough. If that were true we would not need to do PEPs because "what we do is already public". The point of having discussion on mailing lists is precisely so everyone can be aware of the tradeoff and give us their opinions. |
In any case, there is no point to discuss in these terms because I am not the one making decisions here. As RM my only authority is to consider the current slowdown a blocker and ask for a bigger discussion to be taken if we cannot fix the issue. Let's see what we can do and then let's discuss how to proceed once we understand our options. |
No, that was the proximate cause. It wouldn't have been necessary if not for PEP 657. Claiming that PEP 657 had zero overhead is not true. The original implementation had a large overhead in terms on memory use. |
Mark, I understand that and we are on the same page. I know why you made that change, and IIRC I even reviewed it. The problem here is that that change (or other changes) had unexpected consequences on timing and we need to make decisions either trying to fix together it if we can or potentially reverting it we cannot. That's all I am saying
The claim was that PEP 657 had zero overhead on the time computing the line numbers, not on memory.My sentence was:
|
Marking this as a release or deferred blocker is fine and entirely what I'd expect any RM to want to do. That just means we collect more information and make a decision before we exit beta stages. I don't personally have a problem with coverage being slower, even by 2x. It's more about understanding what all is impacted by this and how so we can explain our measured rationale to users. ie: Look beyond just Q1: What else uses tracing? Q1b: What of those use it at other than development time? These questions are informational and do not block performance work on reducing the impact to tracing. |
How bad would it be to store an extra field into code objects that is the "decompressed" line number table, lazily allocated for only those code objects which are actually traced? Could it occupy |
In #93383 (comment) I benchmarked adding an additional field to the code object and there was no slowdown in pyperformance. I do not know how memory consumption is affected. I'm not sure co_extra is the right place for that info considering its for consumers of the PEP 523 API. Maybe I can merge your idea into a simple lazily created PyCodeObject {
...
void *co_tracing_info
}
struct _PyCodeTracingInfo {
// cached code bytes object
PyObject *co_code;
// decompressed line number table
...
} |
An early design for the compressed line number table did just this. It isn't just tracing that needs line numbers, tracebacks need them too. I think we should see how much faster we can make tracing with the compressed table first. If that isn't good enough, then we can reconsidered the decompressed table. |
Debuggers and profiling tools, like cProfile. Q1b: What of those use it at other than development time? Hopefully, no one. Tracing has always imposed a considerable overhead. |
Maybe we could leave line numbers uncompressed and compress the rest? Alternatively, we can leave uncompressed just the line numbers if and only if we detect a tracing or profiling function? This way the extra memory is only paid if someone is profiling or tracing. |
That's going to be complicated. Too complicated for a beta release, IMO.
If we add a small/large flag, then the extra memory will be 2 bytes for instructions for files < 64k lines (basically all code), and 4 bytes per instruction for giant files. |
I've been experimenting with streamlining the |
Actually, if we store the line delta from |
I'm very interested in helping to test people's proposed fixes, but am losing track of what is being changed where. We have #93493, which I have tried. Are there other pull requests in flight to try? |
I would like to see how much we gain back by reverting 944fffe (not that we are planning to do it, just want to know how much the problem is due to that). I can prepare a PR for you to test later today but in the interim you can try reverting yourself (it doesn't cleanly revert, but the conflicts are due to automatically generated files mainly that you can regenerate by running |
IIUC, Kumar benchmarked that commit here and got a 1.3x slowdown before and after. |
|
Looking at the output of perf (using an optimized, but no pgo, lto build) and the code, I see the following reasons for the slowdown:
All but 2 can be fixed without any negative impact on performance when not tracing. Specializing during tracing, increases the overhead of dispatching when not-tracing, resulting in a ~1% slowdown. |
FYI, I have added a coverage 1 benchmark to pyperformance to ease benchmarking coverage / tracing performance. Footnotes |
Bug report
Coverage.py is seeing a significant increase in overhead for tracing code in 3.11 compared to 3.10: nedbat/coveragepy#1287
As an example:
(This is the output of lab/benchmark.py.)
Your environment
The text was updated successfully, but these errors were encountered: