Skip to content

Conversation

@Xuanwo
Copy link
Collaborator

@Xuanwo Xuanwo commented Nov 26, 2025

Close #5346

This PR adds native support of huggingface in lance.


This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.

@github-actions github-actions bot added the enhancement New feature or request label Nov 26, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Xuanwo <[email protected]>
@pavanramkumar
Copy link

thanks for this! looks like some tests have been queued for the past 3 hours or so, is that expected?

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jackye1995
Copy link
Contributor

thanks for this! looks like some tests have been queued for the past 3 hours or so, is that expected?

Yes there are some issues with Github runners right now, we are working on switching to use 3rd party runners

@pavanramkumar
Copy link

were you already able to test with this public hf dataset @Xuanwo?

>>> hf_path = "hf://datasets/pavan-ramkumar/test-slaf/tree/main/synthetic_50k_processed_v21.slaf/expression.lance"
>>> ds = lance.dataset(hf_path)

@jackye1995
Copy link
Contributor

@Xuanwo can you rebase main and the CI should work now

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Nov 27, 2025

Hi, @pavanramkumar, yes, it works!

image

However, we didn't support tree/main in the URI since we have dedicated support for revision, which should be passed through storage options. Is it expected?

Signed-off-by: Xuanwo <[email protected]>
@Xuanwo Xuanwo merged commit 0204e7e into main Nov 27, 2025
24 of 25 checks passed
@Xuanwo Xuanwo deleted the Xuanwo/hf-fragment-control branch November 27, 2025 12:20
@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 84.78261% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...lance-io/src/object_store/providers/huggingface.rs 82.64% 14 Missing and 7 partials ⚠️
rust/lance-io/src/object_store.rs 84.37% 4 Missing and 1 partial ⚠️
rust/lance-io/src/object_store/providers.rs 83.33% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@pavanramkumar
Copy link

Fantastic! I just tested this now and it works. Thank you!

However, we didn't support tree/main in the URI since we have dedicated support for revision, which should be passed through storage options. Is it expected?

Specifying a revision also works nicely!

>>> ds = lance.dataset(hf_path, storage_options={'revision': 'tree/main'})
>>> ds.sample(1)
pyarrow.Table
cell_integer_id: int32
gene_integer_id: int32
value: float
----
cell_integer_id: [[16445]]
gene_integer_id: [[20436]]
value: [[2.2937665]]

It's not urgent, but I don't see revision as a potential argument to storage_options in the docs. cc @prrao87 @Xuanwo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose huggingface hfFileSystem protocol via OpenDAL in pylance

4 participants