Skip to content

feat(hub): list collections #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 4, 2025

Conversation

quanghuynt14
Copy link
Contributor

@quanghuynt14 quanghuynt14 commented Jun 29, 2025

Issue: #271

This PR only includes the call of list collections. Why?

  • It's one of my early contributions to the project, so I think I should split the PR of @hackpk into many small PRs.

  • I was really hoping to be able to make a collection of our best models and datasets, so that they can be displayed on our landing page.

  • I think list collections solve the problem of displaying collections in real time

I’m having a bit of trouble with the type, and I need to search through the project to understand it better. We don't have any public schema API?

Thanks for review it.

@quanghuynt14 quanghuynt14 requested a review from coyotte508 as a code owner June 29, 2025 20:17

const search = new URLSearchParams([
...Object.entries({
limit: String(Math.min(totalToFetch, 100)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the doc, max number of collections par page is 100.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it can be 10000 if all of:

  • owner is set
  • expand is set to false explicitly

but not sure we want to bother

Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I will look into providing a schema for the existing collection APIs

@quanghuynt14
Copy link
Contributor Author

quanghuynt14 commented Jun 30, 2025

I will look into providing a schema for the existing collection APIs

If we’re using ⁠json-schema, we can use ⁠json-schema-to-typescript to generate the types.

Without the schema, it's very difficult to guess what it is. By reverse-engineering 10,000 collections, I identified the type, but not 100% sure it's correct.

@quanghuynt14 quanghuynt14 requested a review from coyotte508 July 1, 2025 01:20
@coyotte508
Copy link
Member

Hey @quanghuynt14 , here's an OpenAPI JSON spec for some of the hub endpoints, including collection ones: https://gist.github.com/coyotte508/ec7d12713d94a4fe4988afa3fffc23f1

It includes json schema for the various responses & requests, and you can also load the openapi schema on https://editor-next.swagger.io/ for example

do you think it's helpful for your PRs?

@quanghuynt14
Copy link
Contributor Author

Yes, it's very complete and helpful. I will check if the type in this PR is correct.

@quanghuynt14
Copy link
Contributor Author

quanghuynt14 commented Jul 3, 2025

I checked the schema and found a mismatch in the schema of "/api/collections/{namespace}/{slug}-{id}":

We have:

  • required prop "position": { "type": "number" }
  • required prop "shareUrl": { "type": "string" }

in the collection object.

After verifying "slug": "google/gemma-3n-685065323f5984ef315c93f4", I found that it does not include this property.

{
    "slug": "google/gemma-3n-685065323f5984ef315c93f4",
    "title": "Gemma 3n",
    "description": "",
    "gating": false,
    "lastUpdated": "2025-06-26T15:55:44.512Z",
    "owner": {
      ...
    },
    "items": [
      ...
    ],
    "theme": "purple",
    "private": false,
    "upvotes": 146,
    "isUpvotedByUser": false
  },

These 2 props (position and shareUrl) are present only in items of type "collection". For example:

{
    "slug": "kristenq/reasoning-684330e0ce0c4e30fe59456a",
    "title": "Reasoning",
    "description": "Advanced reasoning models. ",
    "gating": false,
    "lastUpdated": "2025-06-06T18:18:26.733Z",
    "owner": {
      ...
    },
    "items": [
      {
        "_id": "684330f27ce524322498baa7",
        "position": 0,
        "type": "collection",
        "id": "67ee7145ec3d31f7c7a75cab",
        "slug": "Tesslate/synthia-s1-reasoning-model-67ee7145ec3d31f7c7a75cab",
        "title": "Synthia-S1 REASONING MODEL",
        "description": "Creative, Scientific, and Coding",
        "lastUpdated": "2025-04-03T11:31:05.262Z",
        "numberItems": 3,
        "owner": {
          ...
        },
        "theme": "blue",
        "shareUrl": "https://huggingface.co/collections/Tesslate/synthia-s1-reasoning-model-67ee7145ec3d31f7c7a75cab", 
        "upvotes": 3,
        "isUpvotedByUser": false
      }
    ],
    "theme": "green",
    "private": false,
    "upvotes": 0,
    "isUpvotedByUser": false
  },

Here is the corrected schema: https://gist.github.com/quanghuynt14/1c55f44978248bc12889b1bde2359c0c

  • I deleted prop position in the collection object.
  • I moved prop shareUrl into items type collection.

@coyotte508
Copy link
Member

hmm not sure, https://huggingface.co/api/collections/google/gemma-3n-685065323f5984ef315c93f4 I do have position and shareUrl:
image

@quanghuynt14
Copy link
Contributor Author

quanghuynt14 commented Jul 3, 2025

ah sorry. I assumed that the type of collection returned by GET /api/collections/{namespace}/{slug}-{id} was the same as the ones returned by GET /api/collections but it's not :/

The collection object within the collection list does not contain position or shareUrl

But the shareUrl of an item of type collection is not present in the schema; however, I see it in "open-sci/open-sci-ref-releases-684210f647554eb6e0bb9cf3"

@coyotte508
Copy link
Member

yes indeed, we'll fix the schema on our side, thanks for pointing out

@quanghuynt14
Copy link
Contributor Author

You can review the PR. I think the type is now complete.

Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

I think just one change on how owner and item params are handled and it'll be ready to merge (I'll still re-review)


const search = new URLSearchParams([
...Object.entries({
limit: String(Math.min(totalToFetch, 100)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it can be 10000 if all of:

  • owner is set
  • expand is set to false explicitly

but not sure we want to bother

Comment on lines 44 to 51
const search = new URLSearchParams([
...Object.entries({
limit: String(Math.min(totalToFetch, 100)),
...(params?.search?.owner ? { owner: Array.isArray(params.search.owner) ? params.search.owner.join(",") : params.search.owner } : undefined),
...(params?.search?.item ? { item: Array.isArray(params.search.item) ? params.search.item.join(",") : params.search.item } : undefined),
...(params?.search?.q ? { q: params.search.q } : undefined),
}),
]).toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When params.search.owner and params.search.item are arrays, we should just repeat it, eg:

for (const owner of params.search.owner) {
  search.append("owner", params.search.owner)
}

Eg https://huggingface.co/api/collections?owner=coyotte508&owner=julien-c if owner=['coyotte508','julien-c']

In constrast, https://huggingface.co/api/collections?owner=coyotte508,julien-c seems to return all collections... which is a bug that's getting fixed now it's been found!


By the way, for the JS lib we can force the user to put arrays for both owner and item if we want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I fixed it ✅

@quanghuynt14 quanghuynt14 requested a review from coyotte508 July 4, 2025 11:52
Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks good!

Comment on lines 22 to 46
| {
avatarUrl: string;
fullname: string;
name: string;
isHf: boolean;
isHfAdmin: boolean;
isMod: boolean;
followerCount?: number;
type: "org";
isEnterprise: boolean;
isUserFollowing?: boolean;
}
| {
avatarUrl: string;
fullname: string;
name: string;
isHf: boolean;
isHfAdmin: boolean;
isMod: boolean;
followerCount?: number;
type: "user";
isPro: boolean;
_id: string;
isUserFollowing?: boolean;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract this to ApiAuthor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ad702e7

@quanghuynt14 quanghuynt14 requested a review from coyotte508 July 4, 2025 15:22
@coyotte508 coyotte508 merged commit 2782465 into huggingface:main Jul 4, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants