Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solver pod failing because of expired SSL certificate for https://tensorflow.pypi.thoth-station.ninja/index/manylinux2010/AVX2/simple/ #5195

Closed
mayaCostantini opened this issue Jul 21, 2022 · 5 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mayaCostantini
Copy link
Contributor

mayaCostantini commented Jul 21, 2022

Describe the bug

When trying to solve the torchvision package on the stage environment, some of the solver pods fail because of an expired SSL certificate for our tensorflow package index https://tensorflow.pypi.thoth-station.ninja/index/manylinux2010/AVX2/simple/.
The expired certificate causes a too large number of retries to reach the index URL and thus the pod to exceed the amount of time it can run on its node.

As tensorflow builds are no longer maintained by Thoth, we should either delete this package index or renew the certificate to allow fetching the necessary dependencies from there during solver runs.

To Reproduce
Schedule torchvision to be solved on the stage user API.

Expected behavior
Solver can fetch all necessary dependencies.

Additional context

From one of the solver pods in stage environment:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/solver/python/python.py", line 195, in _resolve_versions
    resolved_versions = solver.solve([package_name + (version_spec or "")])
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/solver/python/base.py", line 75, in solve
    name, releases = self.releases_fetcher.fetch_releases(dep.name)
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/solver/python/python_solver.py", line 49, in fetch_releases
    releases = self.source.get_package_versions(package_name)
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/python/source.py", line 254, in get_package_versions
    return self._simple_repository_list_versions(package_name)
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/python/source.py", line 243, in _simple_repository_list_versions
    for artifact_name, _ in self._simple_repository_list_artifacts(package_name):
  File "/opt/app-root/lib64/python3.9/site-packages/thoth/python/source.py", line 303, in _simple_repository_list_artifacts
    response = requests.get(url, verify=self.verify_ssl)
  File "/opt/app-root/lib64/python3.9/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='tensorflow.pypi.thoth-station.ninja', port=443): Max retries exceeded with url: /index/manylinux2010/AVX2/simple/torch (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)')))
2022-07-21 07:31:46,076  13 DEBUG    thoth.solver.python.python:380: Resolved versions for package 'torch' with range specifier '==1.8.1': []
2022-07-21 07:31:46,078  13 INFO     thoth.solver.python.python:369: Resolving dependency versions for 'pillow' with range '>=4.1.1' from 'https://pulp.operate-first.cloud/pypi/test/simple'
2022-07-21 07:31:46,080  13 DEBUG    thoth.solver.python.base:73: Fetching releases for: <Requirement('pillow>=4.1.1')>
2022-07-21 07:31:46,417  13 INFO     thoth.solver.python.python:197: No versions were resolved for 'pillow' with version specification '>=4.1.1' for package index 'https://pulp.operate-first.cloud/pypi/test/simple'
2022-07-21 07:31:46,418  13 DEBUG    thoth.solver.python.python:380: Resolved versions for package 'pillow' with range specifier '>=4.1.1': []
2022-07-21 07:31:46,418  13 INFO     thoth.solver.python.python:369: Resolving dependency versions for 'pillow' with range '>=4.1.1' from 'https://download.pytorch.org/whl/cu111'
2022-07-21 07:31:46,419  13 DEBUG    thoth.solver.python.base:73: Fetching releases for: <Requirement('pillow>=4.1.1')>
2022-07-21 07:31:46,641  13 ERROR    thoth.solver.python.python:205: Failed to resolve versions for 'pillow' with version spec '>=4.1.1'

@mayaCostantini mayaCostantini added the kind/bug Categorizes issue or PR as related to a bug. label Jul 21, 2022
@mayaCostantini
Copy link
Contributor Author

/sig stack-guidance
/priority critical-urgent

@sesheta sesheta added sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jul 21, 2022
@mayaCostantini
Copy link
Contributor Author

Related to thoth-station/support#245

@harshad16 harshad16 self-assigned this Jul 25, 2022
@harshad16 harshad16 added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 25, 2022
@goern
Copy link
Member

goern commented Jul 26, 2022

@harshad16 is this stack-guidance or more of a devsecops problem?

@harshad16
Copy link
Member

More of Devsecops.
Fixed this.
Related-to: thoth-station/thoth-application#2622

@mayaCostantini
Copy link
Contributor Author

It seems like pods are no longer failing thanks to the fix. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants