Skip to content

Fix C++ Print AST handling of Conversions #4493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 29, 2020

Conversation

criemen
Copy link
Collaborator

@criemen criemen commented Oct 16, 2020

This addresses #4278.

This PR does not add metadata about conversions to the nodes.
The (prior) inclusion of conversion nodes in the tree meant that multiple conversions, i.e.

  c = (char *)(void *)(int *)v;

were handled correctly.
With just a metadata attribute on the VariableAccess, I do not see how to represent the AST faithfully - just including the inner- or outermost cast would be misleading.

I am not happy with this, and open to suggestions how to better address this problem.

@criemen criemen requested a review from a team as a code owner October 16, 2020 10:07
@github-actions github-actions bot added the C++ label Oct 16, 2020
@criemen criemen changed the title Fix 4278 printast conversions Fix C++ Print AST handling of Conversions Oct 16, 2020
@MathiasVP
Copy link
Contributor

Thanks for looking into this!
Is it correct to say that this goal of this PR is to remove the incorrect parent/child relationship between expressions and conversions in the AST viewer?

@criemen
Copy link
Collaborator Author

criemen commented Oct 16, 2020

Yes, indeed.
Without the PR, the AST viewer gives the (incorrect) impression, that conversions and casts are a regular part of the AST parent/child relation.

@jbj
Copy link
Contributor

jbj commented Oct 19, 2020

This PR makes the AST viewer output less confusing, but it also removes all information about conversions. At least for my own usage of PrintAST, that's the most useful information that's in there.

Can you add this information back in a less confusing place? For the output that previously looked like this:

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1: [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue
#    2|           expr: [VariableAccess] v
#    2|               Type = [VoidPointerType] void *
#    2|               ValueCategory = prvalue(load)

I'd suggest making it look like this instead:

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1: [VariableAccess] v
#    2|             Type = [VoidPointerType] void *
#    2|             ValueCategory = prvalue(load)
#    2|           conversion: [CStyleCast] (char *)...
#    2|               Conversion = [PointerConversion] pointer conversion
#    2|               Type = [CharPointerType] char *
#    2|               ValueCategory = prvalue

This reflects how on the VariableAccess you can call getConversion. And as long as you're not using getChild, which you shouldn't, conversions look like children to the expression they're converting. Note that the dump of a Conversion no longer has an expr child; instead, any expression can have a conversion child.

@dbartol
Copy link
Contributor

dbartol commented Oct 19, 2020

I had initially thought that the backwards solution was the logical conclusion of the desire to mirror what getter to call, but my brain couldn't accept printing an expression tree backwards. I was able to rationalize my preferred (forward) solution as follows:

  • All Expr nodes have a child node for e.getFullyConverted()
  • All Conversion nodes have a child node for e.getExpr()
  • If either of the above would give you a child node for an Expr that already appears in the tree elsewhere, do not create the additional child node.

It's not quite as clean a justification as just "create nodes for e.getConversion()", but I really think that having the conversion chain backwards is super confusing, especially given that having multiple conversions on the same node is not very common. Also, I've always found myself using getFullyConverted() far more often than getConversion(), although that may just be due to the areas of the code I tend to work on.

@jbj
Copy link
Contributor

jbj commented Oct 20, 2020

I suppose if #4279 is implemented then both of our solutions will be clearer. Mine will have getConversion printed at every level, and yours will have getFullyConverted printed at the top level and getExpr printed at subsequent levels. I agree it's most common to write, for example, mybinop.getLeftOperand().getFullyConverted(), because traversals most often happen from the top down.

My proposal is backwards, but it preserves adjacency. For (C)(B)a, I'd expect to find B adjacent to a in the AST. Ideally the order should be C,B,a, but we've decided to put a on top, so it'll have to be a,B,C or a,C,B.

I can see advantages and drawbacks of both proposals. I suggest we sleep on it, let the team sleep on it, and make a decision in the next team meeting.

@jbj
Copy link
Contributor

jbj commented Oct 20, 2020

There might be a third option. If we do #4279, then the designators to the left of the colon will change from -1, 0, expr, etc. to getLeftOperand() and so on. At least, that's my interpretation of what #4279 means. We could keep our C,B,a order if we special-case the designator of children with conversions on them to be, for example, getLeftOperand().getFullyConverted(). Because this is long and verbose, it should alert readers to the fact that there's something to look out for here.

Here's what our running example would look like under that proposal.

#    2|     0: [ExprStmt] ExprStmt
#    2|       getExpr(): [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         getLValue(): [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         getRValue().getFullyConverted(): [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue
#    2|           getExpr(): [VariableAccess] v
#    2|               Type = [VoidPointerType] void *
#    2|               ValueCategory = prvalue(load)

@dbartol
Copy link
Contributor

dbartol commented Oct 20, 2020

There might be a third option. If we do #4279, then the designators to the left of the colon will change from -1, 0, expr, etc. to getLeftOperand() and so on.

@jbj That looks pretty verbose. I understand the push to help new QL authors learn what predicates to call, but once a developer has learned the basics, a less verbose form like we have today seems easier to read.

Still, if we're putting the predicate names in there anyway, I do like your proposal to keep the existing order.

@MathiasVP
Copy link
Contributor

Here's what our running example would look like under that proposal.

I really like the idea of using a relatively long line to highlight the fact that there's a conversion that a user might not have thought about. 👍

@criemen
Copy link
Collaborator Author

criemen commented Oct 21, 2020

Here's what our running example would look like under that proposal.

I really like the idea of using a relatively long line to highlight the fact that there's a conversion that a user might not have thought about. +1

Counterpoint: In the VSCode UI, such long strings might not be that easy to display/fit in.
@aeisenberg what do you think? After all, this is your feature request.

@aeisenberg
Copy link
Contributor

My opinion is that as long as the most useful information is generally at the start of the string, then having long strings is fine. If someone wants to see the whole string, they can make the AST Viewer wider, or simply hover over the node. Otherwise, you can just ignore the end part of the string.

One thing we can consider for the future is to differentiate between the hover text and the node text. We could put more detailed information in the hover and keeping the key parts in the node text, but that's less important for now.

@criemen
Copy link
Collaborator Author

criemen commented Oct 22, 2020

The team discussed and likes another option:

  1. Whenever a node has a child that has conversions attached to it, the link to the child and the link to child.getFullyConverted() will be displayed side-by-side.
    From the fully converted child, we will then have the chain of conversions.
    In practice, for return (char*)(void*)(int*)v; this means that the node for the return statement points to the VariableAccess and to the conversion associated with (char *).
  2. We will, as part of addressing Add more metadata for C++ print AST queries #4279 provide short names like left, arg0 etc. for dumping the AST graph in text *using the existing semmle.label property).
    Furthermore, we will introduce an additional property, called semmle.predicate, on the graph edge that contains the full predicate invocation to access the (possibly converted) child. This then can be used by the VS Code AST viewer as hover text, or to navigate to the QL predicate in the library.
    This additional property will be, most likely, not included in the default text dumps of the AST.
  3. If a node has multiple children, we first display all unconverted children, and then the fully converted children.

@jbj
Copy link
Contributor

jbj commented Oct 23, 2020

To make this concrete, here's my interpretation of the running example under "option 4":

#    2|     0: [ExprStmt] ExprStmt
#    2|       0: [AssignExpr] ... = ...
#    2|           Type = [CharPointerType] char *
#    2|           ValueCategory = prvalue
#    2|         0: [VariableAccess] c
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = lvalue
#    2|         1 unconverted: [VariableAccess] v
#    2|             Type = [VoidPointerType] void *
#    2|             ValueCategory = prvalue(load)
#    2|         1 converted: [CStyleCast] (char *)...
#    2|             Conversion = [PointerConversion] pointer conversion
#    2|             Type = [CharPointerType] char *
#    2|             ValueCategory = prvalue

I threw a 1 unconverted label in there, instead of just 1, to make it harder to miss that there are conversions coming later. I'm not sure whether we want that, but it should be easy to change. I didn't write labels like "left" in this example since that should be orthogonal.

@rdmarsh2
Copy link
Contributor

Do the "converted" entries only include the conversion chain, or will there be duplication of the VariableAccess?

@jbj
Copy link
Contributor

jbj commented Oct 26, 2020

Do the "converted" entries only include the conversion chain, or will there be duplication of the VariableAccess?

I'm pretty sure we agreed on "no duplicates", so maybe it should be called "conversions" instead of "converted"?

@criemen criemen force-pushed the fix-4278-printast-conversions branch from 8c19eb6 to 447ba20 Compare October 26, 2020 12:49
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jbj jbj merged commit fa344d2 into github:main Oct 29, 2020
@criemen criemen deleted the fix-4278-printast-conversions branch October 29, 2020 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants