Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upImplement on-disk chunked array #2096
Conversation
} | ||
|
||
template <class T1, class T2, class S1, class S2> | ||
void remap(T1** file_array, T2& array, S1& in_stream, S2& out_stream) |
JohanMabille
Jul 22, 2020
Member
Pointers and pointers to pointers should be avoided if not in very low-level code. What prevents from using references?
Besides this function seems to be called in access methods, having a lot of memory deletion and allocation will be a performance issue.
davidbrochart
Jul 22, 2020
Author
Member
Yes, I can use a reference.
You're right, the current implementation is really not production-ready, it is more like an attempt to have something working and to discuss choices. I will explain some of them.
return m_shape; | ||
} | ||
|
||
xfiles_array(): m_file_array(NULL) |
JohanMabille
Jul 22, 2020
Member
Why is m_file_array
a pointer instead of a plain object? If you keep the pointer, you need to carefully implement the copy and move semantics, otherwise you'll have a lot of crashes when using the xfiles_array:
xfiles_array<xdisk_array>> a1, a2;
// ....
a1 = a2;
// => a1.m_file_array is never destroyed before beeing assigned a2.m_array.
// => a2.m_array will be destroyed twice.
Guys this is really cool! I'm following the code only on the surface but I understand that file-backed caching will soon be supported? If this is interesting I can also take a look at HDF5 based caching in the future, because the file format supports basically everything that an N-dimensional data structure could need. On the other hand, xtensor already supports HDF5 as a backend so maybe its overkill to add dedicated support here (again)? |
This PR gets us closer to Zarr, I'm not sure about HDF5. |
There are two orthogonal things here:
EDIT: replace "disk array" with "file array" according to the comment below. |
@JohanMabille I think |
Alright, so maybe the disk array will be replaced by an in-memory container + a serializer passed directly to the file array class. Let's see what's required in the disk array while you complete itsimplementation, and if the class is really required or not. |
The new test fails with the following error:
which happens as soon as a file has to be written. It works fine outside the test environment. Thoughts? |
|
||
static std::ifstream* m_in_file; | ||
static std::ofstream* m_out_file; | ||
const static xarray<T>* m_array; |
davidbrochart
Jul 22, 2020
Author
Member
I am really not sure about having these static members. The xdisk_reference
class is here to catch assignments to elements of m_array
through its definition of the operator=()
, by making xdisk_reference
the type of xdisk_array
's reference
. But it needs to access m_array
and the output file in order to dump to disk, and they cannot be passed through e.g. the constructor since (as I understand it) the copy constructor is used.
JohanMabille
Jul 22, 2020
Member
You can explicitly invoke the constructor of xdisk_reference
and pass it whatever you want. For instance:
template <class It>
inline reference element(It first, It last)
{
return reference(m_array->element(first, last), &m_array, &m_in_file, &m_out_file);
}
The issue I see with that is that |
Another possibility is to flag the xdisk_array as "dirty" when an element is assigned, and save to disk only upon destruction / unload. |
Not sure about the failing test:
|
Just thinking, currently a |
Now that |
What about the following:
Notice that |
Thanks for the suggestions. I think it is a better abstraction than what we have right now, where we have the file loading/dumping in
|
For the first point, that could be handled by a io_type template parameter (or io_handler), which implements open / close methods. I agree that this will avoid to reinvent the wheel when we want to support distant files. Therefore, the declaration of template <class T, class io_handler, class serializer>
class xfile_array; Totally agree on the second point, |
@JohanMabille This last commit renames template <class EC, class SIN, class SOUT>
class xfile_array; I will look at your suggestion later. |
@davidbrochart awesome! I think further improvements (such as making I would also gather SIN and SOUT in a single io_handler tempate paremeter that handles both reading and writing. |
d44d8ae
into
xtensor-stack:master
Checklist
Description