Tree sitter update #8909

aibaars · 2022-04-27T17:01:30Z

This pull request updates the tree-sitter-ruby grammar used by our extractor. The pull request implements up- and downgrade scripts that are semantics preserving except in case of complex literals (1i, 3.14i, 3ri,etc). In the new grammar complex literals are no longer a simple token but a node with a float, integer, or rational literal as child. As a result an upgrade script would have to create new nodes, which is not possible. Instead the upgrade script simply converts complex literals to float literals. This is wrong, but preserves the structure of the AST so there are no gaps in the control and data flow graphs.

ruby/ql/lib/codeql/ruby/ast/Method.qll

github-advanced-security

Found 20 vulnerabilities.

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_ast_node_info.ql

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_call_arguments.ql

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_call_receiver.ql

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_scope_resolution_def.ql

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_call_receiver.ql

ruby/ql/lib/change-notes/2022-04-30-update-grammar.md

Co-authored-by: intrigus-lgtm <60750685+intrigus-lgtm@users.noreply.github.com>

hmac · 2022-05-10T00:09:56Z

ruby/extractor/Cargo.toml

@@ -11,7 +11,7 @@ flate2 = "1.0"
 node-types = { path = "../node-types" }
 tree-sitter = "0.19"
 tree-sitter-embedded-template = "0.19"
-tree-sitter-ruby = { git = "https://github.com/tree-sitter/tree-sitter-ruby.git", rev = "1ebfdb288842dae5a9233e2509a135949023dd82" }
+tree-sitter-ruby = { git = "https://github.com/tree-sitter/tree-sitter-ruby.git", rev = "1a3936a3545c0bd9344a0bf983fafc7e17443e39" }


Alternatively, we could just remove the rev field here and rely on Cargo.lock to pin the exact revision. Assuming we want to stay relatively up-to-date with tree-sitter-ruby changes, this would help us do that.

That would work, although I'd like to be explicit. Things won't actually work if you refreshed the lock file because a minor grammar change might require dbscheme, library and upgrade/downgrade scripts.

Yeah but then tests would fail right? Either approach works, but I do worry that we will forget to keep this up-to-date over time unless there's some automation around it.

hmac · 2022-05-10T00:58:13Z

ruby/ql/lib/codeql/ruby/ast/Method.qll

@@ -252,10 +252,24 @@ class Lambda extends Callable, BodyStmt, TLambda {

 /** A block. */
 class Block extends Callable, StmtSequence, Scope, TBlock {
+  /**
+   * Get a local variable declared by this block.
+   * For example `local` in `{ | param; local| puts param }`.


I had no idea this was a feature. Is it new?

It's a very old feature, just rarely used.

hmac · 2022-05-10T01:00:42Z

ruby/ql/lib/codeql/ruby/ast/internal/AST.qll

@@ -248,9 +252,6 @@ private module Cached {
        casePattern(g)
      )
    } or
-    TScopeResolutionMethodCall(Ruby::ScopeResolution g, Ruby::Identifier i) {
-      isScopeResolutionMethodCall(g, i)
-    } or


Why are we removing this?

The parser now correctly distinguishes between calls using :: and scope resolutions for constants, so this code is now dead. expr::identifier is a call with :: as the call operator, and expr::Constant is a scope resolution.

Ah so expr::identifer is now considered equivalent to expr.identifier?

Yes indeed. Both are a Call but their getOperator() will return a different token.

hmac · 2022-05-10T04:18:55Z

ruby/ql/lib/codeql/ruby/ast/internal/Call.qll

-    or
-    toGenerated(result) = g.getMethod().(Ruby::ArgumentList).getChild(n)
-  }
+  final override Expr getArgumentImpl(int n) { toGenerated(result) = g.getArguments().getChild(n) }


This looks like a nice cleanup - weird that Ruby::ArgumentList was previously considered a "method"...

Yes, that made no sense at all, and had been on my list of things to fix for a long time ;-)

hmac · 2022-05-10T04:29:28Z

ruby/ql/test/library-tests/ast/Ast.expected

 #    6|   getStmt: [BooleanLiteral] true
-#    7|   getStmt: [BooleanLiteral] TRUE
+#    7|   getStmt: [ConstantReadAccess] TRUE


Nice 👍

I didn't know we were incorrectly interpreting these as booleans before. Good to see it fixed.

hmac · 2022-05-10T04:34:08Z

On the upgrade/downgrade scripts: I have a sense of what they're doing but don't know how to ensure that they work properly. Do we have any tests that ensure they give the expected result? We have a "Check DB upgrade/downgrade script" test but I don't know if it does much more than just run them and make sure nothing throws an error.

aibaars · 2022-05-10T08:48:25Z

On the upgrade/downgrade scripts: I have a sense of what they're doing but don't know how to ensure that they work properly. Do we have any tests that ensure they give the expected result? We have a "Check DB upgrade/downgrade script" test but I don't know if it does much more than just run them and make sure nothing throws an error.

The Check DB upgrade/downgrade script doesn't check that much. I think it only verifies that the upgrades form a chain from the initial dbscheme to the current and the upgrade/downgrade script can be compiled.

I validated the behaviour of the upgrade scripts by building the extractor-pack from the main branch and run the QL tests. This causes QL test to build databases with the old extractor but run the test cases with the new library and dbscheme. In this case CodeQL runs the upgrade scripts because the databases are older than the new dbscheme.

These are the steps I used:

git checkout upstream/main .
./scripts/create-extractor-pack.sh
git checkout HEAD -f
codeql test run --learn  --search-path extractor-pack/ --search-path ql/lib ql/test
git difftool ql/test

For testing the downgrade scripts you can do the opposite, compile a new extractor, but run the QL tests with the old library.

./scripts/create-extractor-pack.sh
git checkout upstream/main ql/lib
codeql test run --learn  --search-path extractor-pack/ ql/test
git difftool ql/test

Note that there are some differences in the test output, but these should be benign or otherwise expected.

ruby/ql/test/library-tests/ast/TreeSitter.ql

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_call_method.ql

nickrolfe · 2022-05-10T11:01:26Z

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_scope_resolution_def.ql

+  RubyScopeResolution() {
+    exists(RubyConstant name | ruby_scope_resolution_def(this, name)) and
+    not ruby_call_def(_, this)
+  }


I was expecting that we'd simply drop rows where the second column is not a @ruby_token_constant, so this charpred seems like it could be deleted.

The not ruby_call_def(_, this) is still important. It says that things like expr::Foo() are not scope resolutions. Even though Foo looks like a constant name, it is not.

nickrolfe · 2022-05-10T11:03:43Z

ruby/ql/lib/upgrades/9fdd1d40fd3c3f8f9db8fabf5a353580d14c663a/ruby_ast_node_info.ql

+  ruby_ast_node_info(node, _, _, loc) and
+  parent instanceof ScopeResolutionMethodCall and
+  node =
+    rank[index + 1](RubyAstNode child, int x, int oldIndex |
+      exists(RubyAstNodeParent oldParent |
+        ruby_ast_node_info(child, oldParent, oldIndex, _) and
+        child != parent.(ScopeResolutionMethodCall).getScopeResolution()
+      |
+        oldParent = parent and x = 1
+        or
+        oldParent = parent.(ScopeResolutionMethodCall).getScopeResolution() and x = 0
+      )
+    |
+      child order by x, oldIndex
+    )


Can you explain what this is doing, and why?

A call like expr::Foo(args) block used to be represented as Call{ method: ScopeResolution(expr, Foo), arguments: args, block: block}. The children of the Call node were:

ScopeResolution(expr, Foo)

args

block

The new parse tree is like Call{ receiver: expr, method: Foo, arguments: args } . For this Call the children should be:

expr

Foo

args

block

To fix the parent child relation for the Call we simply take the children of the ScopeResolution first (x=0) ordered by oldIndex followed by the other children (x=1) of the original Call in their oldIndex order. The rank is used to ensure the new index is consecutive even if some of the children are missing.

ruby/ql/lib/codeql/ruby/ast/Method.qll

Co-authored-by: Nick Rolfe <nickrolfe@github.com>

nickrolfe

Very nice!

github-actions bot added the Ruby label Apr 27, 2022

intrigus-lgtm reviewed Apr 27, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/ast/Method.qll Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Apr 27, 2022

View reviewed changes

aibaars force-pushed the tree-sitter-update branch 2 times, most recently from bb52ca1 to 99b77c6 Compare April 27, 2022 19:37

aibaars added 7 commits April 28, 2022 12:59

Ruby: add tree-sitter test case

7359ffa

Update tree-sitter-ruby

0d93543

Regenerate QLL library

a848929

Update dbscheme stats

65989ae

Update library

20a3e3a

Update tests

d055f9a

Ruby: add upgrade and downgrade scripts

ccc1864

aibaars force-pushed the tree-sitter-update branch 2 times, most recently from 925cf49 to ccc1864 Compare April 29, 2022 14:04

aibaars marked this pull request as ready for review April 29, 2022 14:06

aibaars requested a review from a team as a code owner April 29, 2022 14:06

Add change note

cf4325c

github-actions bot added the documentation label Apr 29, 2022

intrigus-lgtm reviewed Apr 29, 2022

View reviewed changes

ruby/ql/lib/change-notes/2022-04-30-update-grammar.md Outdated Show resolved Hide resolved

Update ruby/ql/lib/change-notes/2022-04-30-update-grammar.md

19e4d34

Co-authored-by: intrigus-lgtm <60750685+intrigus-lgtm@users.noreply.github.com>

hmac reviewed May 10, 2022

View reviewed changes

nickrolfe reviewed May 10, 2022

View reviewed changes

aibaars force-pushed the tree-sitter-update branch from 042bd24 to ba5b715 Compare May 10, 2022 12:34

Address comments

907c3db

Co-authored-by: Nick Rolfe <nickrolfe@github.com>

aibaars force-pushed the tree-sitter-update branch from ba5b715 to 907c3db Compare May 11, 2022 07:59

nickrolfe approved these changes May 11, 2022

View reviewed changes

aibaars merged commit a47e429 into github:main May 11, 2022

Tree sitter update #8909

Tree sitter update #8909

Uh oh!

Conversation

aibaars commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmac commented May 10, 2022

Uh oh!

aibaars commented May 10, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aibaars May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nickrolfe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aibaars commented Apr 27, 2022 •

edited

Loading

aibaars May 10, 2022 •

edited

Loading