Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate overhead when listing spaces #2017

Open
Wauplin opened this issue Feb 6, 2024 · 0 comments
Open

Investigate overhead when listing spaces #2017

Wauplin opened this issue Feb 6, 2024 · 0 comments

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Feb 6, 2024

2 things:

  1. looks like huggingface_hub induced some significant overhead on top of requests especially when listing Spaces
  2. looks like huggingface_hub 0.19.4 takes significantly less time than 0.20.2 (cc @hysts who discovered and reproduced it in an AWS Lambda).

See related slack thread (private).

Would be worth having a look at it checking if we are not doing something too stupid 😁. First convo was about listing spaces but most likely not specific to this endpoint.

(for the record, listing using token=False is also significantly faster than with auth given that the server doesn't have to handle it)

import collections
import cProfile
import time

import requests

from huggingface_hub import HfApi


def count_sdks_hf_hub(limit=None):
    num_spaces = collections.defaultdict(int)
    api = HfApi(user_agent={"is_ci": True}, token=False)
    for space in api.list_spaces(limit=limit):
        if not space.private:
            num_spaces[space.sdk] += 1
    return dict(num_spaces)

def count_sdks_requests(limit=None):
    session = requests.Session()
    session.headers.update({"user-agent": "is_ci/true"})
    url = "https://huggingface.co/api/spaces"

    num_spaces = collections.defaultdict(int)
    n = 0
    while True:
        response = session.get(url)
        response.raise_for_status()
        for space in response.json():
            n += 1
            if not space.get("private"):
                num_spaces[space.get("sdk")] += 1
        url = response.links.get("next", {}).get("url")
        if limit is not None and n >= limit:
            return num_spaces
        if url is None:
            return dict(num_spaces)

for fn in (count_sdks_hf_hub, count_sdks_requests):
    start_t = time.perf_counter()
    res = fn()
    elapsed = time.perf_counter() - start_t
    print(fn.__name__, elapsed, res)

# or
# cProfile.run('count_sdks_hf_hub()')
count_sdks_hf_hub 37.29020519400001 {'streamlit': 29277, 'gradio': 110611, 'static': 5360, None: 867, 'docker': 37227}
count_sdks_requests 29.098300531999485 {'streamlit': 29277, 'gradio': 110611, 'static': 5360, None: 867, 'docker': 37227}

count_sdks_hf_hub 33.34559893700134 {'streamlit': 29274, 'gradio': 110618, 'static': 5360, None: 867, 'docker': 37230}
count_sdks_requests 30.013438875999782 {'streamlit': 29274, 'gradio': 110617, 'static': 5360, None: 867, 'docker': 37230}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant