32 cpp string concatenation library #14954

bdrodes · 2023-11-29T21:05:25Z

Adding a new general purpose StringConcatenation library to allow us to locate general string concatenation operations, grab their operands, and grab the result dataflow node.

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+class StringConcatenation extends Call {
+  StringConcatenation() {
+    // sprintf-like functions, i.e., concat through formating
+    exists(FormattingFunctionCall fc | this = fc)


cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+            this.(FormattingFunctionCall)
+                .getTarget()
+                .(FormattingFunction)


geoffw0 · 2023-12-04T10:15:43Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+
+class StringConcatenation extends Call {
+  StringConcatenation() {
+    // sprintf-like functions, i.e., concat through formating


Suggested change

// sprintf-like functions, i.e., concat through formating

// sprintf-like functions, i.e., concat through formatting

geoffw0 · 2023-12-04T10:16:46Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+  }
+
+  /**
+   * Gets the operands of this concatenation (one of the string operands being


Suggested change

* Gets the operands of this concatenation (one of the string operands being

* Gets an operand of this concatenation (one of the string operands being

(both are reasonable explanations of what this predicate does, but we've standardized on "Gets a" / "Gets an" when there are multiple return values)

geoffw0 · 2023-12-04T10:20:10Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+    // The result is an argument of 'this' (a call)
+    result = this.getAnArgument() and
+    // addresses odd behavior with overloaded operators
+    // i.e., "call to operator+" appearing as an operand


Is this where you have more than one concatenation, e.g. "a" + "b" + "c"? I think in that case we would expect one of the concatenations to have the other (a "call to operator+") as an argument.

Yeah, I'm not sure what the "odd behavior" is here. If you have a call such as:

std::string s1, s2, s3; string s = s1 + s2 + s3;

then this will be represented as:

string s = (s1.operator+(s2)).operator+(s3);

and hence the "call to operator+" is simply the qualifier. Should that call to operator+ not be an operand according to this library?

geoffw0 · 2023-12-04T10:28:25Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+            this.(FormattingFunctionCall)
+                .getTarget()
+                .(FormattingFunction)
+                .getFirstFormatArgumentIndex()


We would ideally exclude strings passed into a formatting function call where the format specifier isn't a variation of %s. For example I think you can output a char * string with %p and it does not concatenate the contents of the string.

geoffw0 · 2023-12-04T10:33:36Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+          [result.asExpr(), result.asIndirectExpr()] =
+            this.(FormattingFunctionCall).getOutputArgument(_)
+        else [result.asExpr(), result.asIndirectExpr()] = this.(Call)
+  }


I think this predicate could be written as a sequence of expressions joined by or, rather than if-then-else. It might be easier to read, it also avoids some potential performance problems that can occur with nested if-then-else.

Yeah, I think there's a misunderstanding of the semantics of QL here. Every time you write something like:

if x instanceof Foo then result = x.(Foo).bar() else if x instanceof Baz then result = x.(Baz).qux() else ...

you can simplify this to be

result = x.(Foo).bar() or result = x.(Baz).qux() or ...

since x.(Foo) won't have any result when x isn't a Foo. And so result = x.(Foo).bar() will contribute 0 tuples to the final predicate when x isn't a Foo.

Just to be clear: There's no inherent performs problems with many nested if-then-elses in QL. What @geoffw0 is referring to is that QL desugars the formula if x then y else z to the formula:

x and y or not x and z

and if you're not careful the not x part can result in quite a lot of tuples if you're not thinking very carefully about what you're writing. For instance, if you have two nested if-then-elses such as:

if x then (if y then z else t) else w

then this desugars to:

x and (if y then z else t) or not x and w

which desugars to

x and ( y and z or not y and t ) or not x and w

and you need to be sure that none of these terms generate a large number of tuples.

Thanks for explaining. To be honest I couldn't really remember why we avoid excessive if-then-else, only that it's usually preferable when you have a choice. 👍

MathiasVP

First round of comments. This kinda reminds me of https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/StringOps.qll#L462. Do you think it's worth taking inspiration from that Go library?

MathiasVP · 2023-12-04T12:32:34Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+import cpp
+import semmle.code.cpp.models.implementations.Strcat
+import semmle.code.cpp.models.interfaces.FormattingFunction
+import semmle.code.cpp.dataflow.new.DataFlow


I don't think we should have people rely on dataflow implicitly being imported when they import this library. Ideally, the other imports should also be private, but if you're using a library for string concatenation I think it's fair that you also implicitly import those other libraries.

Suggested change

import semmle.code.cpp.dataflow.new.DataFlow

private import semmle.code.cpp.dataflow.new.DataFlow

MathiasVP · 2023-12-04T12:34:07Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+class StringConcatenation extends Call {
+  StringConcatenation() {
+    // sprintf-like functions, i.e., concat through formating
+    exists(FormattingFunctionCall fc | this = fc)


As Code Scanning is suggesting.

Suggested change

exists(FormattingFunctionCall fc | this = fc)

this instanceof FormattingFunctionCall

MathiasVP · 2023-12-04T12:37:52Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+    // The result is an argument of 'this' (a call)
+    result = this.getAnArgument() and
+    // addresses odd behavior with overloaded operators
+    // i.e., "call to operator+" appearing as an operand


Yeah, I'm not sure what the "odd behavior" is here. If you have a call such as:

std::string s1, s2, s3; string s = s1 + s2 + s3;

then this will be represented as:

string s = (s1.operator+(s2)).operator+(s3);

and hence the "call to operator+" is simply the qualifier. Should that call to operator+ not be an operand according to this library?

MathiasVP · 2023-12-04T12:39:46Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+    (
+      result.getUnderlyingType().stripType().getName() = "char"
+      or
+      result.getUnderlyingType().getName() = "string"


What string type is this supposed to represent? Since getUnderlyingType strips away typedefs this can't be std::string (which is typedef'd as a version of a basic_string that is also handled in the next disjunct).

MathiasVP · 2023-12-04T12:43:34Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+        this.getArgument(this.getTarget().(StrcatFunction).getParamDest())
+      or
+      // Hardcoding it is also the return
+      [result.asExpr(), result.asIndirectExpr()] = this.(Call)


Why do you need both asExpr() and asIndirectExpr() here? If you want the pointer to the string to be the result, then it should be asExpr(), and if you want the actual char data to be the result then it should be asIndirectExpr().

I would imagine that you want limit this to simply be asExpr()?

MathiasVP · 2023-12-04T12:44:59Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+        [result.asExpr(), result.asIndirectExpr()] =
+          this.getArgument(this.getTarget().(StrlcatFunction).getParamDest())


Should this not be the output argument of calling strlcat? That is, I would expect this to be:

result.asDefiningArgument() = this.getArgument(this.getTarget().(StrlcatFunction).getParamDest())

similarly to how you did the StrcatFunction case?

MathiasVP · 2023-12-04T12:46:01Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+          [result.asExpr(), result.asIndirectExpr()] =
+            this.(FormattingFunctionCall).getOutputArgument(_)


Same here: I think this should be the node representing the output argument. That is, this should probably be:

result.asDefiningArgument() = this.(FormattingFunctionCall).getOutputArgument(_)

right?

MathiasVP · 2023-12-04T12:48:22Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+          [result.asExpr(), result.asIndirectExpr()] =
+            this.(FormattingFunctionCall).getOutputArgument(_)
+        else [result.asExpr(), result.asIndirectExpr()] = this.(Call)
+  }


Yeah, I think there's a misunderstanding of the semantics of QL here. Every time you write something like:

if x instanceof Foo then result = x.(Foo).bar() else if x instanceof Baz then result = x.(Baz).qux() else ...

you can simplify this to be

result = x.(Foo).bar() or result = x.(Baz).qux() or ...

since x.(Foo) won't have any result when x isn't a Foo. And so result = x.(Foo).bar() will contribute 0 tuples to the final predicate when x isn't a Foo.

MathiasVP · 2023-12-04T12:53:25Z

cpp/ql/lib/semmle/code/cpp/commons/StringConcatenation.qll

+          [result.asExpr(), result.asIndirectExpr()] =
+            this.(FormattingFunctionCall).getOutputArgument(_)
+        else [result.asExpr(), result.asIndirectExpr()] = this.(Call)
+  }


Just to be clear: There's no inherent performs problems with many nested if-then-elses in QL. What @geoffw0 is referring to is that QL desugars the formula if x then y else z to the formula:

x and y or not x and z

and if you're not careful the not x part can result in quite a lot of tuples if you're not thinking very carefully about what you're writing. For instance, if you have two nested if-then-elses such as:

if x then (if y then z else t) else w

then this desugars to:

x and (if y then z else t) or not x and w

which desugars to

x and ( y and z or not y and t ) or not x and w

and you need to be sure that none of these terms generate a large number of tuples.

bdrodes added 2 commits November 29, 2023 13:00

Added StringConcatenation.qll

4919c4a

Updated getResultExpr to getResultNode. Added strlcat. Added tests.

94a0420

bdrodes requested a review from a team as a code owner November 29, 2023 21:05

github-actions bot added the C++ label Nov 29, 2023

github-advanced-security bot found potential problems Nov 29, 2023

View reviewed changes

geoffw0 reviewed Dec 4, 2023

View reviewed changes

MathiasVP reviewed Dec 4, 2023

View reviewed changes

32 cpp string concatenation library #14954

32 cpp string concatenation library #14954

bdrodes commented Nov 29, 2023

geoffw0 Dec 4, 2023

geoffw0 Dec 4, 2023

geoffw0 Dec 4, 2023

MathiasVP Dec 4, 2023

geoffw0 Dec 4, 2023

geoffw0 Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

geoffw0 Dec 4, 2023

MathiasVP left a comment

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

MathiasVP Dec 4, 2023

	// sprintf-like functions, i.e., concat through formating
	// sprintf-like functions, i.e., concat through formatting

	* Gets the operands of this concatenation (one of the string operands being
	* Gets an operand of this concatenation (one of the string operands being

	import semmle.code.cpp.dataflow.new.DataFlow
	private import semmle.code.cpp.dataflow.new.DataFlow

	exists(FormattingFunctionCall fc \| this = fc)
	this instanceof FormattingFunctionCall

		[result.asExpr(), result.asIndirectExpr()] =
		this.getArgument(this.getTarget().(StrlcatFunction).getParamDest())

		[result.asExpr(), result.asIndirectExpr()] =
		this.(FormattingFunctionCall).getOutputArgument(_)

32 cpp string concatenation library #14954

Are you sure you want to change the base?

32 cpp string concatenation library #14954

Conversation

bdrodes commented Nov 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MathiasVP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment