New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Automated subclass models #15044
Conversation
Based on some DBs I had that contained dependencies
Also makes `empty.model.yml` empty once again
(makes future diffing much easier)
This is important to model mixins correctly, for example when they help handle incoming requests, and therefore need to know that `self.kwargs` contains data controlled by a user.
:thinkies: turns out that .getASubclass*() had to be applied everywhere...
This required making some of the relevant bits public, but they are marked as internal anyway.
Same trick as 'generate-code-scanning-query-list.py'
for module entry definitions from the dataflow graph.
mostly removing of nodes from the graph. One result lost: ``` check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr ```
Co-authored-by: Taus <tausbn@github.com>
Co-authored-by: Taus <tausbn@github.com>
these changes took performance for loading and writing all files locally 29.60s to 3.17s (that is, using `gather_from_existing`)
Verified by joining all files, splitting again, and observing no diff in git. (these operations only take a few seconds on my local machine, so shouldn't be too much of an issue)
Although it might be hidden by github UI by default, it could be interesting for a reviewer to notice the effect changes in the modeling query has to the results in this file.
In reality, we only want to model this as a `rest_framework.response.Response`, since our .qll modeling is more precise for rest-framework responses than if we also modeled it as a basic django http response. (specifically, that default mime-type handling is way different).
A little more explicit, so less prone to be overlooked when adding a new spec
In the final git history this only deletes one file, but when working locally I deleted ALL files.
(locally done with split + 5 x modeling runs + join, but squashed into one commit)
This reverts commit 0ed363bd79f9d3f9e9a905c1192adfe88f1faffb.
3220c9e
to
72687e0
Compare
I've had to fix the query to find new subclasses, since we did not account for However, since the rest_framework modeling of responses is a more specific subclass of django responses (see code below), we actually only want our .yml modeling to contain the row for codeql/python/ql/lib/semmle/python/frameworks/RestFramework.qll Lines 312 to 325 in 43fe9ca
I also realized that we had missed a few important repos in our automatic analysis. After fixing those two problems, I have regenerated all the modeling, and force pushed some updates to this branch rewriting history to remove the old modeling commits. I guess I am worried about adding multiple commits that each changes 80k lines, so everything has been squashed together instead. I can refer to individual SHAs or restore the actual history if a reviewer would find that helpful 😊 The Internal note: The results were achieved after 5 rounds of analysis of the list of repos (which is internal only). In the last round, only 1 additional model was added, so that's the amount of work to expect next time building all the models from scratch 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good to me. There's a few places where we still use classRef
or cls
instead of subclassRef
, but fixing this inconsistency is probably best left for a follow-up PR.
Overview
This PR adds automatically captured subclass information for a the majority of interesting PyPI packages. As an example of what this PR achieves:
flask-restplus
definesflask_restful.Resource
(src) which is a subclass offlask.views.MethodView
.flask.views.MethodView
flask_restful.Resource
is also aflask.views.MethodView
allows us to model the remote-flow-sources properly.We've traditionally had automatic dependency installation available on the codeql-action, but are moving to a solution without this (see https://github.blog/changelog/2023-07-12-code-scanning-with-codeql-no-longer-installs-python-dependencies-automatically-for-new-users/).
We've previously relied on analyzing installed dependencies to reach this conclusion (by following subclass relationship). This PR is the solution to still be able to reach the same conclusion when we stop installing dependencies.
We achieve this by using the extensible type-models to ahead-of-time record important subclass/aliasing information. For example, see the very first commit (2f17d2f)
Our internal testing shows that for all but a few cases, we end up with a solution comparable or better to what we had before, even when narrowing our focus to repos where dependency installation was successful before.
(Thanks to @tausbn for helping with the modeling ❤️)
Reviewing this PR
What a mess. Sorry. It's a mix of working on the tooling
process-mrva-results.py
/SubclassFinder.qll
and enabling subclasses/aliases to be found in the actual modeling. The latter bit required removing theprivate
annotations for much of our modeling. I think we'll just have to live with that.The only commits I've found that don't follow this pattern, and that could have been made into separate PRs, are:
Notes