Skip to content

Xet Upload with byte array #3035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Xet Upload with byte array #3035

wants to merge 21 commits into from

Conversation

bpronan
Copy link
Collaborator

@bpronan bpronan commented Apr 28, 2025

With a new feature in xet-core, we now support specifying a byte array as upload data for a xet file upload. We are leveraging that to provide support for specifying an array of bytes in the path_or_fileobj parameter to the file upload methods.

The xet-core change comes with some updates to the hf_xet interface. Notably, the notion of a "pointer file" has been removed from the library entirely. During the next major version release of hf_xet, we will be removing the PyPointerFile entirely. This python library PR includes moving on to the new data structures, but we've added a test here to ensure backwards compatibility remains until then.

Note: the xet tests here are run against all PRs in the xet-core library.

This should allow us to address the dataset viewer issue here (cc: @lhoestq).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bpronan for the PR! we should set the minimal version of hf_xet to 1.1.0 here and here so that users won't get an incompatible huggingface_hub <> hf_xet pair in their environment.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for byte arrays @bpronan ! I've left a few comments mostly related to Python syntax. All good otherwise

def test_download_backward_compatibitily(self, tmp_path):
"""Test that xet download works with the old pointer file protocol.

Until the next major version of hf-xet is released, we need to support the old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you plan to make next major release of hf_xet? (to get an idea of how long this test should exist)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 is not on our roadmap at the moment, so it's hard to say conclusively. I would guesstimate that we'll need to make a breaking change within the next 4 months which will force a v2 even if we don't have a v2 feature set by then.

Once that happens, we'll need to update setup.py. This test will fail during that CI run and will need to be removed. In other words, it won't outlive the hf_xet major version update.

bpronan and others added 9 commits May 1, 2025 10:17
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants