-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: All dict constructor args are relevant #17566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. 👍
This is handled by parameter-argument matching
I was not comfortable with the dataflow consistency violations. It turned out we had superflous code, leading to many arguments in the same position. That is fixed now, and I added a test showing that we do see taint through later arguments. |
exists(int index1, int index2 | ppos.isPositionalLowerBound(index1) and index2 >= index1 | | ||
result.getParameter() = func.getArg(index2 + this.positionalOffset()) | ||
) | ||
or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you are removing this bit of code.
From my point of view, it looks like it does the right thing. If you ask for a parameter at position 1..
it will give you all the positional parameters except the first one. Why would we not want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see a comment about dataflow inconsistencies. I would recommend to include them in the git history next time, so it's possible to know what they look like... We do currently use arguments/parameters in a slightly different way than most other dataflow libraries, which means we get inconsistency errors. (which is still something we should solve 😮💨)
Anyway, can you please show the inconsistency warnings, then we can discuss the right approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, can you please show the inconsistency warnings, then we can discuss the right approach?
Sure, you can see some here: https://github.com/github/semmle-code/actions/runs/11020219625/job/30604524761#step:15:830
(Found via the red x on the commit before they were fixed. But that will expire at some point.)
Interesting idea to quote them in the commit message, I will remember that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the same token, I should quote them here rather than link...
For this snippet:
def use_of_apply(func, args):
apply(func, args)
we get alerts
callable | pos | parameter | msg |
---|---|---|---|
Function use_of_apply | position 0.. | ControlFlowNode for func | Parameters with overlapping positions. |
Function use_of_apply | position 0.. | ControlFlowNode for args | Parameters with overlapping positions. |
stating that all parameters are now in conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea to quote them in the commit message, I will remember that 👍
You can commit the files in the CONSISTENCY
directory as part of your commit when the warnings are introduced, then you don't need to quote anything in the commit message. (and obviously remove them once you resolve the problems).
(this is all assuming you run the consistency checks locally, which you can do by setting the --consistency-queries
flag when using the CLI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is even more structured :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would have liked is some details around why we need this. Here's the details I've gathered myself
- we only create these parameter positions when
Argument[<n>..]
is used in MaDcodeql/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll
Lines 60 to 62 in da5abc8
TPositionalParameterLowerBoundPosition(int pos) { FlowSummaryImpl::ParsePositions::isParsedArgumentLowerBoundPosition(_, pos) } or - Currently we only use that in a test-file (see this search)
- So the addition of
Argument[0..]
in this PR caused us to start having aTPositionalParameterPosition
available, which highlighted some problems.
We use DataFlowCallable.getParameter
in multiple places, for example in ParameterNodeImpl:
codeql/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll
Lines 1558 to 1560 in da5abc8
predicate isParameterOf(DataFlowCallable c, ParameterPosition ppos) { | |
this = c.getParameter(ppos) | |
} |
dataflow library generally expects parameters to only be found at one position. (which is why we're now getting the consistency warnings on all positional arguments)
The one place that I could find where removing isPositionalLowerBound
support from DataFlowFunction.getParameter
could end up having impact is in type-tracking implementation:
codeql/python/ql/lib/semmle/python/dataflow/new/internal/TypeTrackingImpl.qll
Lines 177 to 194 in a8fc100
/** | |
* Holds if `nodeFrom` steps to `nodeTo` by being passed as a parameter in a call. | |
* | |
* Flow into summarized library methods is not included, as that will lead to negative | |
* recursion (or, at best, terrible performance), since identifying calls to library | |
* methods is done using API graphs (which uses type tracking). | |
*/ | |
predicate callStep(Node nodeFrom, LocalSourceNode nodeTo) { | |
exists( | |
DataFlowPrivate::DataFlowCall call, DataFlowPrivate::DataFlowCallable callable, | |
DataFlowPrivate::ArgumentPosition apos, DataFlowPrivate::ParameterPosition ppos | |
| | |
nodeFrom = call.getArgument(apos) and | |
nodeTo = callable.getParameter(ppos) and | |
DataFlowPrivate::parameterMatch(ppos, apos) and | |
callable = call.getCallable() | |
) | |
} |
However, since the assumption is that isPositionalLowerBound
is ONLY used for "library callables", and those are not supported in callStep
anyway, the change doesn't have any implications. (and flow-summary-impl uses parameterMatch
internally, so no problem there)
That made me think, what about the "simple" library-callables defined with getACallSimple
(not using API graph)? I think I found the logic that handles parameter positions for those, and I don't see anything about isPositionalLowerBound
. I guess this issue hasn't come up since (a) we haven't used isPositionalLowerBound
for anything, and (b) we only use getACallSimple
in tests -- but the details are a bit foggy, so I might have gotten something wrong in my reasoning. _(I see some test failures in the PR that introduced isPositionalLowerBound
, although the details are not available any longer (and not part of the git history)... I assume that must have been for the handling of flow-summaries for "library callables" that uses the API graph, for which we have tests).
Overall I think it looks good to remove this code from getParameter
, but it wasn't really clear to me before spending almost an hour looking through these things. I think we might have a bug in our handling of getACallSimple
, which honestly doesn't seem super important.
However, I would strongly recommend a comment explaining why isPositionalLowerBound
shouldn't be handled in getParameter
(so it doesn't just look like an omission when someone checks if all parameter positions are handled in there)
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Pull Request checklist
All query authors
.qhelp
. See the documentation in this repository.Internal query authors only
.ql
,.qll
, or.qhelp
files. See the documentation (internal access required).