New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Lazy Shape Inference functions to pytorch core #72756
Conversation
CI Flow Status
|
Workflows | Labels (bold enabled) | Status |
---|---|---|
Triggered Workflows | ||
linux-binary-conda | ciflow/binaries , ciflow/binaries_conda , ciflow/default |
|
linux-binary-libtorch-cxx11-abi | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
linux-binary-libtorch-pre-cxx11 | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
linux-binary-manywheel | ciflow/binaries , ciflow/binaries_wheel , ciflow/default |
|
linux-bionic-py3.7-clang9 | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/noarch , ciflow/trunk , ciflow/xla |
|
linux-bionic-rocm4.5-py3.7 | ciflow/all , ciflow/default , ciflow/linux , ciflow/rocm , ciflow/trunk |
|
linux-docs | ciflow/all , ciflow/cpu , ciflow/default , ciflow/docs , ciflow/linux , ciflow/trunk |
|
linux-vulkan-bionic-py3.7-clang9 | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk , ciflow/vulkan |
|
linux-xenial-cuda11.3-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/default , ciflow/linux , ciflow/trunk |
|
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test | ciflow/all , ciflow/bazel , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
linux-xenial-py3-clang5-mobile-build | ciflow/all , ciflow/default , ciflow/linux , ciflow/mobile , ciflow/trunk |
|
linux-xenial-py3-clang5-mobile-custom-build-static | ciflow/all , ciflow/default , ciflow/linux , ciflow/mobile , ciflow/trunk |
|
linux-xenial-py3.7-clang7-asan | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/sanitizers , ciflow/trunk |
|
linux-xenial-py3.7-clang7-onnx | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/onnx , ciflow/trunk |
|
linux-xenial-py3.7-gcc5.4 | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
linux-xenial-py3.7-gcc7 | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
linux-xenial-py3.7-gcc7-no-ops | ciflow/all , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
macos-arm64-binary-conda | ciflow/binaries , ciflow/binaries_conda , ciflow/default |
|
macos-arm64-binary-wheel | ciflow/binaries , ciflow/binaries_wheel , ciflow/default |
|
macos-binary-conda | ciflow/binaries , ciflow/binaries_conda , ciflow/default |
|
macos-binary-libtorch-cxx11-abi | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
macos-binary-libtorch-pre-cxx11 | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
macos-binary-wheel | ciflow/binaries , ciflow/binaries_wheel , ciflow/default |
|
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single | ciflow/all , ciflow/android , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit | ciflow/all , ciflow/android , ciflow/cpu , ciflow/default , ciflow/linux , ciflow/trunk |
|
win-vs2019-cpu-py3 | ciflow/all , ciflow/cpu , ciflow/default , ciflow/trunk , ciflow/win |
|
win-vs2019-cuda11.3-py3 | ciflow/all , ciflow/cuda , ciflow/default , ciflow/trunk , ciflow/win |
|
windows-binary-libtorch-cxx11-abi | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
windows-binary-libtorch-pre-cxx11 | ciflow/binaries , ciflow/binaries_libtorch , ciflow/default |
|
windows-binary-wheel | ciflow/binaries , ciflow/binaries_wheel , ciflow/default |
|
Skipped Workflows | ||
caffe2-linux-xenial-py3.7-gcc5.4 | ciflow/all , ciflow/cpu , ciflow/linux , ciflow/trunk |
|
docker-builds | ciflow/all , ciflow/trunk |
|
ios-12-5-1-arm64 | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-arm64-coreml | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-arm64-custom-ops | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-arm64-full-jit | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-arm64-metal | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-x86-64 | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-x86-64-coreml | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
ios-12-5-1-x86-64-full-jit | ciflow/all , ciflow/ios , ciflow/macos , ciflow/trunk |
|
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/libtorch , ciflow/linux , ciflow/trunk |
|
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/libtorch , ciflow/linux , ciflow/trunk |
|
linux-bionic-cuda10.2-py3.9-gcc7 | ciflow/all , ciflow/cuda , ciflow/linux , ciflow/slow , ciflow/trunk |
|
linux-docs-push | ciflow/all , ciflow/cpu , ciflow/linux , ciflow/scheduled |
|
linux-xenial-cuda11.3-py3.7-gcc7-no-ops | ciflow/all , ciflow/cuda , ciflow/linux , ciflow/trunk |
|
macos-10-15-py3-arm64 | ciflow/all , ciflow/macos , ciflow/trunk |
|
macos-10-15-py3-lite-interpreter-x86-64 | ciflow/all , ciflow/macos , ciflow/trunk |
|
macos-11-py3-x86-64 | ciflow/all , ciflow/macos , ciflow/trunk |
|
parallelnative-linux-xenial-py3.7-gcc5.4 | ciflow/all , ciflow/cpu , ciflow/linux , ciflow/trunk |
|
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/libtorch , ciflow/linux , ciflow/scheduled |
|
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/libtorch , ciflow/linux , ciflow/scheduled |
|
periodic-linux-bionic-cuda11.5-py3.7-gcc7 | ciflow/all , ciflow/cuda , ciflow/linux , ciflow/scheduled |
|
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck | ciflow/all , ciflow/cuda , ciflow/linux , ciflow/scheduled , ciflow/slow , ciflow/slow-gradcheck |
|
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug | ciflow/all , ciflow/cuda , ciflow/linux , ciflow/scheduled |
|
periodic-win-vs2019-cuda11.1-py3 | ciflow/all , ciflow/cuda , ciflow/scheduled , ciflow/win |
|
periodic-win-vs2019-cuda11.5-py3 | ciflow/all , ciflow/cuda , ciflow/scheduled , ciflow/win |
|
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build | ciflow/all , ciflow/android , ciflow/cpu , ciflow/linux , ciflow/trunk |
|
This pull request was exported from Phabricator. Differential Revision: D34187999 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D34187999 |
This pull request was exported from Phabricator. Differential Revision: D34187999 |
This pull request was exported from Phabricator. Differential Revision: D34187999 |
tools/codegen/api/lazy.py
Outdated
return typ.type == valueT | ||
# I am regretting my naming conventions, but now we are wrapping at::scalar in | ||
# lazy value, while preserving other 'scalar' types as scalars in the IR | ||
return typ.type == valueT or typ.type == scalarT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally follow from the comment - can you give a code example?
It also doesn't seem like this really matches the description of the function: "This is equivalent to being Tensor-like", if we directly return true on scalar types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy tensor tracing promotes at::Scalars to become lazy tensors. This enables treating the scalars as inputs to the lazy trace and not burning them into the program.
it could use a bit of fresh paint. i think i might want to refactor some of this stuff but probably in a separate PR. I could use your suggestions for that too more generally. but I don't think I want to make too many changes at this PR since it's aiming to just sync up master to what we are already using on staging. Better to develop the new version on staging where we can test it directly. (until we finally finish merging enough to master that we can build/run entirely on master's CI.)
tools/codegen/api/lazy.py
Outdated
if curr_args is not None: | ||
if isinstance(curr_args, TensorOptionsArguments): | ||
curr_args = curr_args.all() | ||
keyword_arg_types.extend([NamedCType(arg.name, process_ir_type(arg.type)) for arg in curr_args]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question - it looks like right now, LazyIrSchema
has separate fields for positional and keyword arguments. Is that info used meaningfully anywhere?
If not, I think you could simplify a bunch of this logic into something like:
self.keyword_arg_types = [NamedCType(arg.name, process_ir_type(arg.type)) for ag in func.schema_order_arguments()]
That would help you avoid having to special-case TensorOptionsArguments
, SelfArgument
, and have to explicitly list out the Arguments
field names (which I think would cause LTC codegen to break in pretty subtle ways if we changed the names of those fields in core codegen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you look at lazy_ts_lowering.py
you'll see how we generate different code for kw args vs positional args. I couldn't think of a cleaner way at the time but I agree it's a bit complex.
tools/codegen/api/lazy.py
Outdated
self.keyword_arg_types = tuple(keyword_arg_types) | ||
self.name = func.name | ||
self.returns = func.returns | ||
self.wrapped_scalar_names = [arg.name for arg in func.schema_order_arguments() if isWrappedScalarType(arg.type)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the LazyIrSchema
object knows "what are the C++ types and names of all arguments in my schema", and now it also knows "which of my arguments were originally scalars before processing", using this separate list.
I think the other "data-model-y" option you could take would be to represent that info per-argument using a new type (e.g. a LTCArgument
that knows both its NamedCType
, and whether it was a scalar argument). The advantage there is that your processing functions (node_ctor_arg_rvalue_string
) can run on each argument in isolation, instead of having to take in the whole schema. But if this list is used in a really minimal/targeted way then maybe that's overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this approach sounds good. generally i think i want to separate out more 'refactoring' to a separate diff. Lets collect ideas for that on a GH issue. i'll make one and link here: #72960
tools/codegen/dest/lazy_ir.py
Outdated
|
||
def aten_symbol(schema: LazyIrSchema) -> str: | ||
missing_interned_strings = { | ||
'sigmoid_backward', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: now that this is all in core, should we just add these symbols in core instead of having to handle them specially in codegen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually added a handful of symbols to core, and was never sure why i had to do that, and ultimately figured this was a cleaner escape hatch.
if the idea is to have all the symbols be constants in core, should we just make a diff that adds all the symbols at once? otherwise, it's not super clear why we have some and not others...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to look into where at::aten::{symbol}
is actually defined - do you know where it is?
eter Bell landed some changes recently that I think are meant to codegen all of the aten symbols, so I'm wondering if this probably was basically already solved for you. (I can see sigmoid_backward
listed in the codegen'd file build/aten/src/ATen/core/aten_interned_strings.h
)
tools/codegen/dest/lazy_ir.py
Outdated
node_ctor_input_str = node_ctor_inputs(schema) | ||
|
||
# Only generate shape/dtype fn for non-structured kernels, | ||
# since we just use the meta function for structured kernels | ||
if not f.structured and f.structured_delegate is None: | ||
shape_sig = ComputeShapeSignature(f) | ||
shape_sig = ComputeShapeSignature(metadata.kernel, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like a big deal - but any reason why we need to use the metadata.kernel
to choose how to name the shape functions? For example, we could do something like f.func.name.unambiguous_name()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't answer off the top of my head. I probably didn't know about unambiguous_name(). I think i needed metadata.kernel to overcome some ambiguity so this may be possible to substitute. I'll have to test it separately (back on staging) and then drop the update in if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so here is one case where they differ.
ipdb> p metadata.kernel
'arange_out'
ipdb> p f.func.name.unambiguous_name()
'arange_start_out'
I'm not sure what the right thing to do is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving for a refactor sounds fine.
I think at a high level, what we probably want to have happen (based on what some of the other codegen in core does), is that the shape signatures are part of some API (the "lazy API", or "lazy shape API") that knows how to generate its name. That way at both the place where you define those functions in the codegen, and at the place where you call those functions, you can use the same api call to generate the name (and you don't have to remember whether or not it was actually using metadata.kernel
or f.unambiguous_name()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea that makes sense. i'll reflect that in the other (refactor) issue.
This pull request was exported from Phabricator. Differential Revision: D34187999 |
Summary: Pull Request resolved: pytorch#72756 Test Plan: tested via lazy_tensor_staging branch (test_ptltc operator tests as well as torchbench inference) Differential Revision: D34187999 fbshipit-source-id: 5138fcfbbb3567078f0253881beda3ff4b77fa84
Summary: Pull Request resolved: pytorch#72756 Test Plan: tested via lazy_tensor_staging branch (test_ptltc operator tests as well as torchbench inference) Reviewed By: bdhirsh Differential Revision: D34187999 fbshipit-source-id: a9d2c2cabe4eb2d921b04e4086181568116190d4
This pull request was exported from Phabricator. Differential Revision: D34187999 |
Summary: Pull Request resolved: pytorch#72756 Test Plan: tested via lazy_tensor_staging branch (test_ptltc operator tests as well as torchbench inference) Reviewed By: bdhirsh Differential Revision: D34187999 fbshipit-source-id: 0fb6856a111e76408f32de227f3e5dfa2d2eb28c
This pull request was exported from Phabricator. Differential Revision: D34187999 |
Summary: Pull Request resolved: #72756 Test Plan: tested via lazy_tensor_staging branch (test_ptltc operator tests as well as torchbench inference) Reviewed By: bdhirsh Differential Revision: D34187999 fbshipit-source-id: f40c5869d573de4a58233ee1e8710f8a0ea7f284
Hey @wconstab. |
Test Plan: tested via lazy_tensor_staging branch (test_ptltc operator tests as well as torchbench inference)
Differential Revision: D34187999
The text was updated successfully, but these errors were encountered: