[Docs] Provide documentation on where blob storage is accessed and which keys are being used #4881

ddl-ebrown · 2024-02-12T15:11:28Z

Description

There are several different places where various Flyte services or tools require blob storage access, and for folks new to the project, it would be nice to have a few notes on the internals of how API calls happen. The initial question I raised in Slack at https://flyte-org.slack.com/archives/CP2HDHKE1/p1707508706335559 was

Is there a good graph of the sequence of API calls between `pyflyte run --remote` / flyteadmin and blob storage endpoints? I'm mostly trying to have a good picture around when `CreateUploadLocation` gets called /used, etc

Some of this is documented in https://docs.flyte.org/en/latest/concepts/data_management.html, but it would be nice to extend that to include:

pyflyte run --remote calls FlyteAdmin, calls blob storage to get a signed url and returns it to the client so that it can upload
when inputs / outputs are downloaded / uploaded and using what creds for native Flyte tasks
when inputs / outputs are downloaded / uploaded for raw container tasks (i.e. by copilot)
default config vs. other config mechanisms

While Flyte has a good default approach of using IRSA in AWS, that might not always be the best fit -- so it's helpful to know some of the specifics around blob storage access patterns so that the right solution can be employed.

Are you sure this issue hasn't been raised already?

Yes

Have you read the Code of Conduct?

Yes

The text was updated successfully, but these errors were encountered:

welcome · 2024-02-12T15:11:31Z

Thank you for opening your first issue here! 🛠

dosubot · 2024-02-12T15:11:47Z

🤖

Hey @ddl-ebrown! Great to see you diving deeper into the workings of Flyte. Your curiosity really adds value to the community!

Thank you for your detailed request. I'll try to provide some insights based on the information available in the Flyte codebase and documentation.

pyflyte run --remote calls FlyteAdmin, calls blob storage to get a signed url and returns it to the client so that it can upload: The pyflyte run --remote command interacts with FlyteAdmin and blob storage to get a signed URL and return it to the client for upload. This is done through the Get function in the AWSRemoteURL struct, which is responsible for generating the signed URL. You can find more details in the flyteadmin/pkg/data/implementations/aws_remote_url.go file.
When inputs / outputs are downloaded / uploaded and using what creds for native Flyte tasks: In Flyte, the process of downloading and uploading inputs and outputs for native tasks is managed by the CoreIOStrategy class. This class has two main attributes: download_mode and upload_mode. These attributes are instances of the IOStrategyDownloadMode and IOStrategyUploadMode classes respectively. The credentials used in this process would typically be managed by the underlying data storage or transfer service being used, and would need to be configured separately. You can find more details in the flyteidl/gen/pb_python/flyteidl/service/flyteadmin/flyteadmin/models/core_io_strategy.py file.
When inputs / outputs are downloaded / uploaded for raw container tasks (i.e. by copilot): The context does not provide information on this. It would be helpful if you could provide more details on what you mean by "raw container tasks" and "copilot".
Default config vs. other config mechanisms: Flyte services or tools use various blob storage access points, which are defined in the Config struct in the flytestdlib/storage/config.go file. The Type field in the Config struct sets the type of storage to configure. The Connection field defines the connection configurations, and the Stow field defines the configurations for the Stow backend. You can find more details in the flytestdlib/storage/config.go file.

I hope this information is helpful. If you have any further questions or need more details, please let me know.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Ritesh-Udgata · 2024-10-28T16:04:33Z

please assign me to this issue

davidmirror-ops · 2024-10-28T17:10:52Z

@Ritesh-Udgata let us know any questions you may have soon, as Hacktoberfest is closing on Thursday. Thank you for volunteering!

ddl-ebrown added documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers labels Feb 12, 2024

eapolinario added backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Feb 15, 2024

davidmirror-ops added the hacktoberfest label Sep 27, 2024

davidmirror-ops mentioned this issue Sep 28, 2024

Flyte Hacktoberfest 2024: issues and guidelines #5783

Closed

56 tasks

davidmirror-ops assigned Ritesh-Udgata Oct 28, 2024

Ritesh-Udgata linked a pull request Oct 28, 2024 that will close this issue

blob-storage-usage doc update #5931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Provide documentation on where blob storage is accessed and which keys are being used #4881

[Docs] Provide documentation on where blob storage is accessed and which keys are being used #4881

ddl-ebrown commented Feb 12, 2024

welcome bot commented Feb 12, 2024

dosubot bot commented Feb 12, 2024 •

edited

Loading

About Dosu

Ritesh-Udgata commented Oct 28, 2024

davidmirror-ops commented Oct 28, 2024

[Docs] Provide documentation on where blob storage is accessed and which keys are being used #4881

[Docs] Provide documentation on where blob storage is accessed and which keys are being used #4881

Comments

ddl-ebrown commented Feb 12, 2024

Description

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome bot commented Feb 12, 2024

dosubot bot commented Feb 12, 2024 • edited Loading

Sources

About Dosu

Ritesh-Udgata commented Oct 28, 2024

davidmirror-ops commented Oct 28, 2024

dosubot bot commented Feb 12, 2024 •

edited

Loading