Python: Fix bad `fastTC` in `ASTNode::contains` #8028

tausbn · 2022-02-14T15:23:01Z

Between 2.8.0 and 2.8.1, the performance of ASTNode::contains degraded considerably for some queries (and some databases), one such example being the py/comparison-of-identical-expressions query.

In 2.8.0, the contains relation was inlined into the helper predicates for that query, whereas in 2.8.1, the contains relation was evaluated in isolation using a boundedFastTC:

(4s) Starting to evaluate predicate boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f)/2@c4bc3flu = HOP boundedFastTC(3646779,2616172)
(887s)  >>> Relation boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f): 12307755 rows using 115 MB

I tried rewriting the contains predicate to use a manual TC, but this also had poor performance (though still better than fastTC). Because contains is used in a lot of places, it's inconvenient to manually fix up and magic all of these places, and so I opted to inline this predicate instead.

Between 2.8.0 and 2.8.1, the performance of `ASTNode::contains` degraded considerably for some queries, one such example being the `py/comparison-of-identical-expressions` query. In 2.8.0, the `contains` relation was inlined into the helper predicates for that query, whereas in 2.8.1, the `contains` relation was evaluated in isolation using a `boundedFastTC`: ``` (4s) Starting to evaluate predicate boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f)/2@c4bc3flu = HOP boundedFastTC(3646779,2616172) (887s) >>> Relation boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f): 12307755 rows using 115 MB ``` I tried rewriting the `contains` predicate to use a manual TC, but this also had poor performance (though still better than `fastTC`). Because `contains` is used in a lot of places, it's inconvenient to manually fix up and magick all of these places, and so I opted to `inline` this predicate instead.

yoff · 2022-02-15T11:23:43Z

Performance test not looking too promising..?

tausbn · 2022-02-15T12:30:36Z

Yeah, the trouble with pragma[inline] is that it doesn't actually guarantee anything. It may decide to inline and then do the exact same sequence of evaluations that you would get without inlining.

I have a further fix for one of the queries that did this. I don't know if it's sufficient, but I'll push it and rerun the performance tests.

This was one instance where inlining didn't actually affect the join order (and so the compiler decided to materialise the full `fastTC` without using any of the local context). In this case, it seems to be sufficient to simply manually specialise the `contains` relation to the particular setup needed here. (Note that the first occurrence of `contains` in this query is _not_ the same predicate -- it's a version that works on `StmtList`s.)

yoff · 2022-02-15T13:48:40Z

I have a further fix for one of the queries that did this. I don't know if it's sufficient, but I'll push it and rerun the performance tests.

Ok, that fix looks sensible, let's see what effect is has.. :-)

This definition of `getCase` was inlined in `getScope` and then evaluated in its entirety (without any magic) which turned out to be costly. This meant that any use of `getScope` would pay this penalty if not already present in the cache. The rewrite in this commit _should_ have the same outwards behaviour, since the parent of a `Pattern` is either a pattern itself, or a `Case`. However, by restricting to just these classes, we never materialise the entire `contains` relation.

This one had the same bad performance: ``` NestedLoopsSameVariableWithReuse.ql-3:boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,py_stmts_0#higher_order_body) ........ 14m24s ``` Compared to now: ``` NestedLoopsSameVariableWithReuse.ql-3:NestedLoopsSameVariableWithReuse::forLoopContainsName#ff ......... 10ms ```

tausbn · 2022-02-16T12:08:49Z

Hm. Seems my getCase fix broke the tests. I'll see if I can rewrite it to work correctly again.

Turns out that in isolation, `getCase` is fine. What was killing the performance was the fact that it got inlined into `getScope`, where it resulted in the full evaluation of the flipped `contains` relation. I think `nomagic` is sensible here -- I would not expect there to be so many results for this predicate that evaluating it would become a problem (note for instance that it already is restricted to relating `Pattern`s and `Case`s).

tausbn · 2022-02-17T13:39:54Z

I think the latest evaluation looks good. There are a few individual queries that got slower overall, but the slowdown there is on the order of 10 seconds in total (whereas the overall speedup is between 5% and 9%).

tausbn · 2022-02-17T14:45:01Z

I am tempted, however, to just do a PR where I add the nomagic to getCase. I wonder if that's the bit that made the most difference here. What do you think?

yoff · 2022-02-25T11:14:31Z

I am tempted, however, to just do a PR where I add the nomagic to getCase. I wonder if that's the bit that made the most difference here. What do you think?

Yes, that could well be a simple fix, avoiding splitting out predicates.

This is a simplified version of github#8028 consisting of just the `nomagic` fix.

RasmusWL · 2022-03-17T08:14:08Z

Was #8255 a sufficient fix for this performance problem?

tausbn · 2022-05-04T21:55:30Z

Was #8255 a sufficient fix for this performance problem?

I think so. I'll close this for now. If we start seeing degradation in the calculation of contains, we can always revive this PR.

tausbn added the no-change-note-required This PR does not need a change note label Feb 14, 2022

tausbn requested a review from a team as a code owner February 14, 2022 15:23

github-actions bot added the Python label Feb 14, 2022

tausbn added 2 commits February 15, 2022 22:25

tausbn added a commit to tausbn/codeql that referenced this pull request Feb 25, 2022

Python: Prevent magic/inlining in getCase

622b326

This is a simplified version of github#8028 consisting of just the `nomagic` fix.

tausbn mentioned this pull request Feb 25, 2022

Python: Prevent magic/inlining in getCase #8255

Merged

tausbn closed this May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Fix bad `fastTC` in `ASTNode::contains` #8028

Python: Fix bad `fastTC` in `ASTNode::contains` #8028

Uh oh!

tausbn commented Feb 14, 2022

Uh oh!

yoff commented Feb 15, 2022

Uh oh!

tausbn commented Feb 15, 2022

Uh oh!

yoff commented Feb 15, 2022

Uh oh!

tausbn commented Feb 16, 2022

Uh oh!

tausbn commented Feb 17, 2022

Uh oh!

tausbn commented Feb 17, 2022

Uh oh!

yoff commented Feb 25, 2022

Uh oh!

RasmusWL commented Mar 17, 2022

Uh oh!

tausbn commented May 4, 2022

Uh oh!

Uh oh!

Python: Fix bad fastTC in ASTNode::contains #8028

Python: Fix bad fastTC in ASTNode::contains #8028

Uh oh!

Conversation

tausbn commented Feb 14, 2022

Uh oh!

yoff commented Feb 15, 2022

Uh oh!

tausbn commented Feb 15, 2022

Uh oh!

yoff commented Feb 15, 2022

Uh oh!

tausbn commented Feb 16, 2022

Uh oh!

tausbn commented Feb 17, 2022

Uh oh!

tausbn commented Feb 17, 2022

Uh oh!

yoff commented Feb 25, 2022

Uh oh!

RasmusWL commented Mar 17, 2022

Uh oh!

tausbn commented May 4, 2022

Uh oh!

Uh oh!

Python: Fix bad `fastTC` in `ASTNode::contains` #8028

Python: Fix bad `fastTC` in `ASTNode::contains` #8028