ENH: Implement the DLPack Array API protocols for ndarray. #19083
Conversation
d111d23
to
248a695
Thanks Hameer, looks like a good start. Of the points @seberg brought up on the issue, adding a version attribute seems the most important - because checking for version will be needed before any backwards-incompatible change (such as adding an extra field) can be done. I asked about it on dmlc/dlpack#34, and the suggestion was to add it as an attribute on the |
ac6c005
to
5aa28d8
Sure -- I commented there instead of here, so more of the relevant parties can see it. |
Hi Hameer, could you also add these two new methods to the ndarray
stubs in numpy/__init__.pyi
?
Their signatures are fairly simple, so that's a plus:
# NOTE: Every single import below is already present in `__init__.pyi`
# and doesn't have to be repeated
from typing import Any, Tuple
from typing_extensions import Literal as L
from numpy import number
from numpy.typing import NDArray
# `builtins.PyCapsule` unfortunately lacks annotations as of the moment;
# use `Any` as a stopgap measure
_PyCapsule = Any
def __dlpack__(self: NDArray[number[Any]], *, stream: None = ...) -> _PyCapsule: ...
def __dlpack_device__(self) -> Tuple[L[1], L[0]]: ...
77e49cc
to
830903a
Had started looking at it, so a few small comments. It looks good, I have to think a bit more about the API, but that has nothing to do with the implementation.
I think if we put the header where you put it, it will be effectively public. I guess that is fine thoug? Otherwise I think placing it into common
is good.
# `builtins.PyCapsule` unfortunately lacks annotations as of the moment; | ||
# use `Any` as a stopgap measure | ||
_PyCapsule = Any | ||
|
Interestingly, if we could get builtins.PyCapsule
in typeshed annotated as a parameterizable type,
then it would in principle be possible for static type checkers to read, e.g., the underlying shape and dtype.
The only thing that users or libraries would have to do here is declare the necessary annotations.
Perhaps something to consider for the future?
Examples
from typing import TypeVar, Any, Generic, Tuple as Tuple
import numpy as np
# Improvised `PyCapsule` annotation
_T = TypeVar("_T")
class PyCapsule(Generic[_T]): ...
# Construct a more compact `PyCapsule` alias; `Tuple` used herein to introduce 2 parameters
# (there may be more appropriate types that can fulfill this functionality)
_Shape = TypeVar("_Shape", bound=Any) # TODO: Wait for PEP 646's TypeVarTuple
_DType = TypeVar("_DType", bound=np.dtype[Any])
DLPackCapsule = PyCapsule[Tuple[_Shape, _DType]]
# A practical example
def from_dlpack(__x: DLPackCapsule[_Shape, _DType]) -> np.ndarray[_Shape, _DType]: ...
I'll consider this as out of scope of this PR for now, but will leave the conversation unresolved for visibility.
a5d1adf
to
becdf4d
There's a standardized name, see data-apis/array-api#186. I have modified the PR accordingly.
Unaligned is OK, see #19013 (comment). How does one check for native dtypes?
Can we verify this? |
835c638
to
e6d2195
Ignore, it does mean IEEE float. Does |
|
I guess if longdouble is 64bit, you can be sure its just double. That we could check for if we want. There are a couple of macros in Technically, most current hardware uses something like 80bit (IEEE?) stored as 96bit or 128bit. But DLPack can't describe it (its an old thing that our |
You can use |
DLManagedTensor *managed = | ||
(DLManagedTensor *)PyCapsule_GetPointer(self, NPY_DLPACK_INTERNAL_CAPSULE_NAME); | ||
if (managed == NULL) { | ||
return; | ||
} | ||
managed->deleter(managed); |
As above, this is unsafe:
deleter
may be nullPyCapsule_GetPointer
is unsafe to call if an exception is in flight, soPyErr_Fetch
is needed first.
@hameerabbasi, I don't think this one is resolved.
I am working on the counterpart support for CuPy and I realize we should have this question settled first: For cc: @kmaehashi @emcastillo @asi1024 @rgommers @asmeurer @hameerabbasi @eric-wieser @seberg @mattip @BvB93 |
int64_t *managed_shape_strides = PyMem_Malloc(sizeof(int64_t) * ndim * 2); | ||
if (managed_shape_strides == NULL) { |
This incorrectly raises an error when ndim = 0
. Coalescing the allocations would resolve that bug.
return NULL; | ||
} | ||
|
||
const int ndim = managed->dl_tensor.ndim; |
Add a check for ndim < 0
npy_intp shape[NPY_MAXDIMS]; | ||
npy_intp strides[NPY_MAXDIMS]; | ||
|
||
for (int i = 0; i < ndim; ++i) { |
Before this loop, add a check for shape == NULL
if ndim > 0
.
Thanks for the comments, but at least I expect that the exporter ensures that this is not clearly malformed (since this is C-API and not Python input/API). For half of these, they are actually also checked during the array creation.
I just did my best to find potential problems. Pleases resolve my comments if you consider any of them a non-issue.
npy_intp strides[NPY_MAXDIMS]; | ||
|
||
for (int i = 0; i < ndim; ++i) { | ||
shape[i] = managed->dl_tensor.shape[i]; |
Add a check for shape[i] < 0
.
} | ||
} | ||
|
||
char *data = (char *)managed->dl_tensor.data + |
Add a check for managed->dl_tensor.data == NULL
if product(shape) > 0
.
The question is a little ambiguous (use of
DLPack is useful standalone, independent of NEP 47. This PR therefore adds DLPack support to NumPy in its main namespace (so only (1) above). |
Ah OK, thanks @rgommers, it makes sense although I didn't anticipate this. We should add a clause to https://data-apis.org/array-api/latest/design_topics/data_interchange.html to clarify this. The way I read it as in its current form, I would think |
It should do that! Assuming the namespace you call it in conforms to the array API:) In this way it's no different from |
Right, I was just trying to make sure we are allowed to add |
I personally would like to just create a new dlpack_impl.c
and dlpack_impl.h
in multiarray
. Then export the __dlpack__
and from_dlpack
there include it into methods.c
and multiarray.c
and keep all of DLPack in one place. Especially multiarray.c
is just too unnecessarily long, and its easy to go from the method list/struct to the actual definition in the other file.
device_type != kDLCUDAManaged) { | ||
PyErr_SetString(PyExc_RuntimeError, | ||
"Unsupported device in DLTensor."); | ||
Py_XDECREF(capsule); |
Py_XDECREF(capsule); | |
Py_DECREF(capsule); |
Not that it really matters.
|
||
if (managed->dl_tensor.dtype.lanes != 1) { | ||
PyErr_SetString(PyExc_RuntimeError, | ||
"Unsupported lanes in DLTensor dtype."); |
Might be nice to expand on "lanes" here (or include a link to where its explained?). On the other hand, I guess the reader should know what lanes are if they use them?
} | ||
|
||
PyArray_Descr *descr = PyArray_DescrFromType(typenum); | ||
if (descr == NULL) { |
This can't actually fail. But its obscure enough that I think its fine to keep.
Just move this down directly before the PyArray_NewFromDescr
call (you can even put it into the call, since it can't fail). That avoids reference counting headaches.
return NULL; | ||
} | ||
|
||
if (PyCapsule_SetName(capsule, NPY_DLPACK_USED_CAPSULE_NAME) < 0) { |
This feels a bit tricky if it would fail. But I guess its fine. Most importantly, at this time it cannot even fail, since failure is only possible if the capsule
was not valid. (If it would fail, we might end up freeing twice here? Once for the new_capsule
and once for the capsule
itself?)
DLDevice device = array_get_dl_device(self); | ||
if (PyErr_Occurred()) { | ||
return NULL; | ||
} | ||
char *data = array_get_dl_data(self); | ||
if (data == NULL) { | ||
return NULL; | ||
} |
Do we really need these? All we would lose is a bit of information when ndarray
is an intermediate exchange step, which seems OK?
(if we do, it would be nice to add a test, since I think calling from_dlpack
twice on the same array will do the trick?)
PyErr_SetString(PyExc_TypeError, "DLPack only supports native " | ||
"byte swapping."); | ||
return NULL; | ||
} |
It might be that we should refuse exporting readonly arrays here. At least if they are not backed by numpy-owned data (truly readonly), to prevent a pytorch importer from being able to accidentally crash the interpreter. I have opened an issue here for discussion.
managed_dtype.code = kDLInt; | ||
} else if (PyDataType_ISUNSIGNED(dtype)) { | ||
managed_dtype.code = kDLUInt; | ||
} else if (PyDataType_ISFLOAT(dtype)) { |
NIT: we put the else
on the next line.
PyErr_SetString(PyExc_TypeError, "DLPack only supports IEEE " | ||
"floating point types without padding."); |
NIT:
PyErr_SetString(PyExc_TypeError, "DLPack only supports IEEE " | |
"floating point types without padding."); | |
PyErr_SetString(PyExc_TypeError, | |
"DLPack only supports IEEE floating point types without " | |
"padding."); |
y2 = x[:, 0] | ||
assert_array_equal(y2, np.from_dlpack(y2)) | ||
|
||
y3 = x[1, :] |
Could parametrize, but its fine. Also ::2
is a nice, since then you got a 2-D non-contiguous one.
@@ -3654,3 +3661,8 @@ class broadcast: | |||
def __next__(self) -> Tuple[Any, ...]: ... | |||
def __iter__(self: _T) -> _T: ... | |||
def reset(self) -> None: ... | |||
|
|||
class _SupportsDLPack(Protocol[_T_contra]): | |||
def __dlpack__(self, *, stream: Optional[int] = ...) -> _PyCapsule: ... |
def __dlpack__(self, *, stream: Optional[int] = ...) -> _PyCapsule: ... | |
def __dlpack__(self, *, stream: None | _T_contra = ...) -> _PyCapsule: ... |
The |
operator is just a more compact shortcut for Union
or Optional
,
but more importantly: the typevar is missing from the argument type.
Fixes #19013
cc @rgommers @mattip @seberg
TODO
from_dlpack
function.__dlpack__
The text was updated successfully, but these errors were encountered: