Skip to content

Python: Fix bad fastTC in ASTNode::contains #8028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Feb 14, 2022

Between 2.8.0 and 2.8.1, the performance of ASTNode::contains degraded considerably for some queries (and some databases), one such example being the py/comparison-of-identical-expressions query.

In 2.8.0, the contains relation was inlined into the helper predicates for that query, whereas in 2.8.1, the contains relation was evaluated in isolation using a boundedFastTC:

(4s) Starting to evaluate predicate boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f)/2@c4bc3flu = HOP boundedFastTC(3646779,2616172)
(887s)  >>> Relation boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f): 12307755 rows using 115 MB

I tried rewriting the contains predicate to use a manual TC, but this also had poor performance (though still better than fastTC). Because contains is used in a lot of places, it's inconvenient to manually fix up and magic all of these places, and so I opted to inline this predicate instead.

Between 2.8.0 and 2.8.1, the performance of `ASTNode::contains`
degraded considerably for some queries, one such example being the
`py/comparison-of-identical-expressions` query.

In 2.8.0, the `contains` relation was inlined into the helper predicates
for that query, whereas in 2.8.1, the `contains` relation was evaluated
in isolation using a `boundedFastTC`:
```
(4s) Starting to evaluate predicate boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f)/2@c4bc3flu = HOP boundedFastTC(3646779,2616172)
(887s)  >>> Relation boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,AstExtended::AstNode#class#f): 12307755 rows using 115 MB
```

I tried rewriting the `contains` predicate to use a manual TC, but this
also had poor performance (though still better than `fastTC`). Because
`contains` is used in a lot of places, it's inconvenient to manually
fix up and magick all of these places, and so I opted to `inline` this
predicate instead.
@tausbn tausbn added the no-change-note-required This PR does not need a change note label Feb 14, 2022
@tausbn tausbn requested a review from a team as a code owner February 14, 2022 15:23
@yoff
Copy link
Contributor

yoff commented Feb 15, 2022

Performance test not looking too promising..?

@tausbn
Copy link
Contributor Author

tausbn commented Feb 15, 2022

Yeah, the trouble with pragma[inline] is that it doesn't actually guarantee anything. It may decide to inline and then do the exact same sequence of evaluations that you would get without inlining.

I have a further fix for one of the queries that did this. I don't know if it's sufficient, but I'll push it and rerun the performance tests.

This was one instance where inlining didn't actually affect the join
order (and so the compiler decided to materialise the full `fastTC`
without using any of the local context). In this case, it seems to be
sufficient to simply manually specialise the `contains` relation to the
particular setup needed here.

(Note that the first occurrence of `contains` in this query is _not_ the
same predicate -- it's a version that works on `StmtList`s.)
@yoff
Copy link
Contributor

yoff commented Feb 15, 2022

I have a further fix for one of the queries that did this. I don't know if it's sufficient, but I'll push it and rerun the performance tests.

Ok, that fix looks sensible, let's see what effect is has.. :-)

This definition of `getCase` was inlined in `getScope` and then
evaluated in its entirety (without any magic) which turned out to be
costly. This meant that any use of `getScope` would pay this penalty
if not already present in the cache.

The rewrite in this commit _should_ have the same outwards behaviour,
since the parent of a `Pattern` is either a pattern itself, or a `Case`.
However, by restricting to just these classes, we never materialise the
entire `contains` relation.
This one had the same bad performance:
```
NestedLoopsSameVariableWithReuse.ql-3:boundedFastTC(AstExtended::AstNode::getAChildNode_dispred#ff,py_stmts_0#higher_order_body) ........ 14m24s
```

Compared to now:
```
NestedLoopsSameVariableWithReuse.ql-3:NestedLoopsSameVariableWithReuse::forLoopContainsName#ff ......... 10ms
```
@tausbn
Copy link
Contributor Author

tausbn commented Feb 16, 2022

Hm. Seems my getCase fix broke the tests. I'll see if I can rewrite it to work correctly again.

Turns out that in isolation, `getCase` is fine. What was killing the
performance was the fact that it got inlined into `getScope`, where it
resulted in the full evaluation of the flipped `contains` relation.

I think `nomagic` is sensible here -- I would not expect there to be so
many results for this predicate that evaluating it would become a
problem (note for instance that it already is restricted to relating
`Pattern`s and `Case`s).
@tausbn
Copy link
Contributor Author

tausbn commented Feb 17, 2022

I think the latest evaluation looks good. There are a few individual queries that got slower overall, but the slowdown there is on the order of 10 seconds in total (whereas the overall speedup is between 5% and 9%).

@tausbn
Copy link
Contributor Author

tausbn commented Feb 17, 2022

I am tempted, however, to just do a PR where I add the nomagic to getCase. I wonder if that's the bit that made the most difference here. What do you think?

@yoff
Copy link
Contributor

yoff commented Feb 25, 2022

I am tempted, however, to just do a PR where I add the nomagic to getCase. I wonder if that's the bit that made the most difference here. What do you think?

Yes, that could well be a simple fix, avoiding splitting out predicates.

tausbn added a commit to tausbn/codeql that referenced this pull request Feb 25, 2022
This is a simplified version of
github#8028
consisting of just the `nomagic` fix.
@RasmusWL
Copy link
Member

Was #8255 a sufficient fix for this performance problem?

@tausbn
Copy link
Contributor Author

tausbn commented May 4, 2022

Was #8255 a sufficient fix for this performance problem?

I think so. I'll close this for now. If we start seeing degradation in the calculation of contains, we can always revive this PR.

@tausbn tausbn closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants