Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial tensorflow stubs #8974

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hmc-cs-mdrissi
Copy link
Contributor

@hmc-cs-mdrissi hmc-cs-mdrissi commented Oct 24, 2022

Summary

Initial tensorflow stubs to start with. These are very incomplete given that tensorflow's api is very large. I'm reviving this now as supporting non-types dependencies has made progress and is a requirement for these stubs as they strongly rely on numpy. Similar to opencv stubs this should not merge until non type dependencies work in CI.

These are all manually made based on tensorflow documentation, testing in repl, and usage in large codebase reliant on tensorflow. mypy stubgen last I tried, crashes on tensorflow's codebase and pyright works but produces a large number of false positives. Both mypy and pyright share a weakness in that tensorflow relies on a lot of dynamic module magic with imports. It uses a decorator(tf_export) to re-export class in a different module sometimes even though module may not exist by itself. A secondary issue is most tensorflow classes are defined in internal locations without any private prefixing, and re-exported elsewhere that's documented. And some core tensorflow classes rely on dynamic patching of methods. As an example tf.Tensor class one of the most commonly used types in tensorflow many of it's basic methods like __add__ and __mul__ are dynamically patched on.

I consider this ready for discussion as there are a couple questions I have before moving forward.

  1. What would be best approach for review? I currently have roughly ~3k lines of incomplete stubs for tensorflow and this pr adds some of the basic pieces. Would you prefer a large pr with all I currently have or chop it up into dozen smaller prs of a few hundred lines each? I'm using getattr(name: str) -> Incomplete pattern to minimize false positives users would experience. If I do make a pr with full stubs I have currently I'll need to be careful of that as my current local stubs often missed that because at work I followed a philosophy of false positives are better then missing an error and fix stub as needed. A full version of what I have would look roughly like this pr.
  2. Is it fine to leave default values for functions in stubs? My current stubs include default values and they are mostly int/str/None. I find them useful for IDE usage, but I'm ok dropping them if that's preferred for consistency with most other typeshed stubs. If including defaults is ok, where should I adjust mypy/pyright/etc settings to not produce an error of Default values in stub files should be specified as "..."
  3. Tensor related classes probably should be generic over data type and shape. I'm avoiding that as both would add a lot more complexity to stubs. Shapes especially as typevartuple support is in progress and a lot of shape types may require further peps. DTypes are mostly possible now just may add a lot more overloads/verbosity to track. I could make class generic today but mostly use Any for the type arguments? Not sure on recommended practice here.

The files/classes added here are a couple of the most commonly used ones. If small pr approach is good I'll likely continue by filling in more of tensorflow/__init__.pyi and tensorflow/math.pyi followed by adding in other commonly used files.

@github-actions

This comment has been minimized.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Oct 24, 2022

You're missing a layer of nesting. It should be stubs/distribution_name/package_name. This explains many of the mypy_primer errors you're seeing.

For the questions you ask, my non-authoritative opinions are:

  1. I'm mostly agnostic. If there's stuff that you expect to be controversial, I would separate it from stuff that is non-controversial. Note that while typeshed's philosophy is that false negatives are better than false positives, it's not the biggest deal if we have some false positives as we work through your PRs. As long as we do this quickly, it won't hurt users, since there are currently zero users of types-tensorflow.
  2. I would prefer omitting default values from stubs.
  3. Let's stick with the simple non-generic stuff for now.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2022

Diff from mypy_primer, showing the effect of this PR on open source code:

jax (https://github.com/google/jax)
- jax/collect_profile.py:25: error: Cannot find implementation or library stub for module named "tensorflow.python.profiler"  [import]
- jax/experimental/jax2tf/tests/model_harness.py:35: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/examples/tf_js/quickdraw/quickdraw.py:30: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/tools/jax_to_ir.py:86: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/converters.py:20: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
+ jax/experimental/jax2tf/tests/call_tf_test.py:36: error: Incompatible types in assignment (expression has type "None", variable has type Module)  [assignment]

streamlit (https://github.com/streamlit/streamlit)
+ lib/tests/streamlit/runtime/caching/hashing_test.py:48: note: ... from here:
+ lib/tests/streamlit/runtime/caching/hashing_test.py:48: note: ... from here:
+ lib/tests/streamlit/runtime/caching/hashing_test.py:48: note: In module imported here:

@gramster
Copy link

gramster commented Oct 25, 2022

  1. I would prefer omitting default values from stubs.

@hauntsaninja, an advantage of including default values is that tools like pyright or pylance will use stubs (if available) for the signatures they show to users when hovering over a function call, and users want to see the default values. I'd like to understand your point of view though as to why you would prefer them removed?

@hmc-cs-mdrissi
Copy link
Contributor Author

hmc-cs-mdrissi commented Oct 25, 2022

I took a look at primer hits.

The jax hit mypy's redefinition rule and they have code like,

try:
  import tensorflow
except:
  tensorflow = None

I'm less sure about what streamlit hit means about note ... here. The code they have is,

try:
    import tensorflow as tf
except ImportError:
    pass

The remaining errors look to be expected from lack of numpy in environment.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Oct 26, 2022

@gramster I'm personally okay in theory with default values in stubs if we restrict to simple literals, but it should be its own discussion — my comment here mostly just reflected the status quo, changing of which would need discussion. I also figured we're not really losing that much work on this PR: simple literal default values should be easy to automatically reintroduce to all non-overload stubs and we'd need to do so for all existing stubs anyway.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Oct 26, 2022

Anyway, opened #8988 to discuss default values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants