-
Notifications
You must be signed in to change notification settings - Fork 687
Xet Upload with byte array #3035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding support for byte arrays @bpronan ! I've left a few comments mostly related to Python syntax. All good otherwise
def test_download_backward_compatibitily(self, tmp_path): | ||
"""Test that xet download works with the old pointer file protocol. | ||
|
||
Until the next major version of hf-xet is released, we need to support the old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do you plan to make next major release of hf_xet? (to get an idea of how long this test should exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v2 is not on our roadmap at the moment, so it's hard to say conclusively. I would guesstimate that we'll need to make a breaking change within the next 4 months which will force a v2 even if we don't have a v2 feature set by then.
Once that happens, we'll need to update setup.py
. This test will fail during that CI run and will need to be removed. In other words, it won't outlive the hf_xet
major version update.
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
With a new feature in xet-core, we now support specifying a byte array as upload data for a xet file upload. We are leveraging that to provide support for specifying an array of
bytes
in thepath_or_fileobj
parameter to the file upload methods.The xet-core change comes with some updates to the
hf_xet
interface. Notably, the notion of a "pointer file" has been removed from the library entirely. During the next major version release of hf_xet, we will be removing thePyPointerFile
entirely. This python library PR includes moving on to the new data structures, but we've added a test here to ensure backwards compatibility remains until then.Note: the xet tests here are run against all PRs in the xet-core library.
This should allow us to address the dataset viewer issue here (cc: @lhoestq).