Fix C++ Print AST handling of Conversions #4493

criemen · 2020-10-16T10:07:22Z

This addresses #4278.

This PR does not add metadata about conversions to the nodes.
The (prior) inclusion of conversion nodes in the tree meant that multiple conversions, i.e.

  c = (char *)(void *)(int *)v;

were handled correctly.
With just a metadata attribute on the VariableAccess, I do not see how to represent the AST faithfully - just including the inner- or outermost cast would be misleading.

I am not happy with this, and open to suggestions how to better address this problem.

MathiasVP · 2020-10-16T11:14:53Z

Thanks for looking into this!
Is it correct to say that this goal of this PR is to remove the incorrect parent/child relationship between expressions and conversions in the AST viewer?

criemen · 2020-10-16T12:13:19Z

Yes, indeed.
Without the PR, the AST viewer gives the (incorrect) impression, that conversions and casts are a regular part of the AST parent/child relation.

jbj · 2020-10-19T07:28:25Z

This PR makes the AST viewer output less confusing, but it also removes all information about conversions. At least for my own usage of PrintAST, that's the most useful information that's in there.

Can you add this information back in a less confusing place? For the output that previously looked like this:

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1: [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue
#    2|           expr: [VariableAccess] v
#    2|               Type = [VoidPointerType] void *
#    2|               ValueCategory = prvalue(load)

I'd suggest making it look like this instead:

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1: [VariableAccess] v
#    2|             Type = [VoidPointerType] void *
#    2|             ValueCategory = prvalue(load)
#    2|           conversion: [CStyleCast] (char *)...
#    2|               Conversion = [PointerConversion] pointer conversion
#    2|               Type = [CharPointerType] char *
#    2|               ValueCategory = prvalue

This reflects how on the VariableAccess you can call getConversion. And as long as you're not using getChild, which you shouldn't, conversions look like children to the expression they're converting. Note that the dump of a Conversion no longer has an expr child; instead, any expression can have a conversion child.

dbartol · 2020-10-19T14:46:42Z

I had initially thought that the backwards solution was the logical conclusion of the desire to mirror what getter to call, but my brain couldn't accept printing an expression tree backwards. I was able to rationalize my preferred (forward) solution as follows:

All Expr nodes have a child node for e.getFullyConverted()
All Conversion nodes have a child node for e.getExpr()
If either of the above would give you a child node for an Expr that already appears in the tree elsewhere, do not create the additional child node.

It's not quite as clean a justification as just "create nodes for e.getConversion()", but I really think that having the conversion chain backwards is super confusing, especially given that having multiple conversions on the same node is not very common. Also, I've always found myself using getFullyConverted() far more often than getConversion(), although that may just be due to the areas of the code I tend to work on.

jbj · 2020-10-20T06:43:10Z

I suppose if #4279 is implemented then both of our solutions will be clearer. Mine will have getConversion printed at every level, and yours will have getFullyConverted printed at the top level and getExpr printed at subsequent levels. I agree it's most common to write, for example, mybinop.getLeftOperand().getFullyConverted(), because traversals most often happen from the top down.

My proposal is backwards, but it preserves adjacency. For (C)(B)a, I'd expect to find B adjacent to a in the AST. Ideally the order should be C,B,a, but we've decided to put a on top, so it'll have to be a,B,C or a,C,B.

I can see advantages and drawbacks of both proposals. I suggest we sleep on it, let the team sleep on it, and make a decision in the next team meeting.

jbj · 2020-10-20T12:07:29Z

There might be a third option. If we do #4279, then the designators to the left of the colon will change from -1, 0, expr, etc. to getLeftOperand() and so on. At least, that's my interpretation of what #4279 means. We could keep our C,B,a order if we special-case the designator of children with conversions on them to be, for example, getLeftOperand().getFullyConverted(). Because this is long and verbose, it should alert readers to the fact that there's something to look out for here.

Here's what our running example would look like under that proposal.

#    2|     0: [ExprStmt] ExprStmt
#    2|       getExpr(): [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         getLValue(): [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         getRValue().getFullyConverted(): [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue
#    2|           getExpr(): [VariableAccess] v
#    2|               Type = [VoidPointerType] void *
#    2|               ValueCategory = prvalue(load)

dbartol · 2020-10-20T21:49:31Z

There might be a third option. If we do #4279, then the designators to the left of the colon will change from -1, 0, expr, etc. to getLeftOperand() and so on.

@jbj That looks pretty verbose. I understand the push to help new QL authors learn what predicates to call, but once a developer has learned the basics, a less verbose form like we have today seems easier to read.

Still, if we're putting the predicate names in there anyway, I do like your proposal to keep the existing order.

MathiasVP · 2020-10-21T11:56:05Z

Here's what our running example would look like under that proposal.

I really like the idea of using a relatively long line to highlight the fact that there's a conversion that a user might not have thought about. 👍

criemen · 2020-10-21T12:04:31Z

Here's what our running example would look like under that proposal.

I really like the idea of using a relatively long line to highlight the fact that there's a conversion that a user might not have thought about. +1

Counterpoint: In the VSCode UI, such long strings might not be that easy to display/fit in.
@aeisenberg what do you think? After all, this is your feature request.

aeisenberg · 2020-10-21T15:46:54Z

My opinion is that as long as the most useful information is generally at the start of the string, then having long strings is fine. If someone wants to see the whole string, they can make the AST Viewer wider, or simply hover over the node. Otherwise, you can just ignore the end part of the string.

One thing we can consider for the future is to differentiate between the hover text and the node text. We could put more detailed information in the hover and keeping the key parts in the node text, but that's less important for now.

criemen · 2020-10-22T15:49:12Z

The team discussed and likes another option:

Whenever a node has a child that has conversions attached to it, the link to the child and the link to child.getFullyConverted() will be displayed side-by-side.
From the fully converted child, we will then have the chain of conversions.
In practice, for return (char*)(void*)(int*)v; this means that the node for the return statement points to the VariableAccess and to the conversion associated with (char *).
We will, as part of addressing Add more metadata for C++ print AST queries #4279 provide short names like left, arg0 etc. for dumping the AST graph in text *using the existing semmle.label property).
Furthermore, we will introduce an additional property, called semmle.predicate, on the graph edge that contains the full predicate invocation to access the (possibly converted) child. This then can be used by the VS Code AST viewer as hover text, or to navigate to the QL predicate in the library.
This additional property will be, most likely, not included in the default text dumps of the AST.
If a node has multiple children, we first display all unconverted children, and then the fully converted children.

jbj · 2020-10-23T08:51:13Z

To make this concrete, here's my interpretation of the running example under "option 4":

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1 unconverted: [VariableAccess] v
#    2|             Type = [VoidPointerType] void *
#    2|             ValueCategory = prvalue(load)
#    2|         1 converted: [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue

I threw a 1 unconverted label in there, instead of just 1, to make it harder to miss that there are conversions coming later. I'm not sure whether we want that, but it should be easy to change. I didn't write labels like "left" in this example since that should be orthogonal.

rdmarsh2 · 2020-10-23T17:51:58Z

Do the "converted" entries only include the conversion chain, or will there be duplication of the VariableAccess?

jbj · 2020-10-26T07:34:08Z

Do the "converted" entries only include the conversion chain, or will there be duplication of the VariableAccess?

I'm pretty sure we agreed on "no duplicates", so maybe it should be called "conversions" instead of "converted"?

cpp/ql/src/semmle/code/cpp/PrintAST.qll

cpp/ql/test/examples/expressions/PrintAST.expected

jbj

LGTM!

criemen requested a review from a team as a code owner October 16, 2020 10:07

github-actions bot added the C++ label Oct 16, 2020

criemen changed the title ~~Fix 4278 printast conversions~~ Fix C++ Print AST handling of Conversions Oct 16, 2020

criemen added 2 commits October 26, 2020 13:25

C++: Add new test for conversions.

0e5c44e

C++: Move Conversions in PrintAST to the side.

447ba20

criemen force-pushed the fix-4278-printast-conversions branch from 8c19eb6 to 447ba20 Compare October 26, 2020 12:49

C++: Simplify code, add comment explaining the logic.

1b88ca1

jbj assigned dbartol Oct 28, 2020

criemen mentioned this pull request Oct 28, 2020

C++: Provide the predicates that can be used to traverse the AST as metadata to PrintAST #4562

Merged

jbj reviewed Oct 29, 2020

View reviewed changes

C++: Address review, fix bug related to Conversions.

59dd892

jbj approved these changes Oct 29, 2020

View reviewed changes

jbj merged commit fa344d2 into github:main Oct 29, 2020

criemen deleted the fix-4278-printast-conversions branch October 29, 2020 12:49

Fix C++ Print AST handling of Conversions #4493

Fix C++ Print AST handling of Conversions #4493

Uh oh!

Conversation

criemen commented Oct 16, 2020

Uh oh!

MathiasVP commented Oct 16, 2020

Uh oh!

criemen commented Oct 16, 2020

Uh oh!

jbj commented Oct 19, 2020

Uh oh!

dbartol commented Oct 19, 2020

Uh oh!

jbj commented Oct 20, 2020

Uh oh!

jbj commented Oct 20, 2020

Uh oh!

dbartol commented Oct 20, 2020

Uh oh!

MathiasVP commented Oct 21, 2020

Uh oh!

criemen commented Oct 21, 2020

Uh oh!

aeisenberg commented Oct 21, 2020

Uh oh!

criemen commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbj commented Oct 23, 2020

Uh oh!

rdmarsh2 commented Oct 23, 2020

Uh oh!

jbj commented Oct 26, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

criemen commented Oct 22, 2020 •

edited

Loading