Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make XTable as a community managed Airflow provider #495

Open
2 tasks done
gyli opened this issue Jul 25, 2024 · 8 comments
Open
2 tasks done

Make XTable as a community managed Airflow provider #495

gyli opened this issue Jul 25, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@gyli
Copy link

gyli commented Jul 25, 2024

Feature Request / Improvement

Hi XTable maintainers,

I am planning to create an Airflow operator for XTable, and also trying to make it as a community managed Airflow provider.

By saying Airflow operator, what I mean is something similar to what AWS presents in this blog, which is a wrapper of XTable's java command, allowing users to trigger it with config in Python codes and as an Airflow task. I believe integrating XTable in Airflow has great benefits for making it popular, and closer to be an industry standard.

I've tried proposing this with Airflow directly, while it requires votes to add it as a provider. More importantly, they are also looking for support from XTable (or maybe even OneHouse?) directly, since they prefer "mixed governance" approach. As an example, here is the discussion in Airflow devlist about adding a new provider. Hence, I am requesting your support to bring this discussion on the table of both sides, provide more background and evidence why XTable is helpful for data engineers (who are highly possibly Airflow users as well), and support such vote in Airflow devlist.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@the-other-tim-brown
Copy link
Contributor

@gyli I like this idea and it seems like a natural way for Airflow users to sync their tables after some other step has run in their Airflow pipelines. Onehouse does not own XTable since it is an Apache Incubating project so I don't know how that will work with the mix governance proposed. I also lack the experience with Airflow to know what it would mean to build and maintain an operator.

@gyli
Copy link
Author

gyli commented Jul 30, 2024

To provide more details and examples of Airflow providers:

  1. Here is an Airflow provider for Spark, which I think could be a good example since it's essentially also a wrapper of Spark commands.
  2. AWS has a demo for the XTable operator https://github.com/aws-samples/apache-xtable-on-aws-samples, while it works as an MWAA plugin. The core operator logic should be very similar.
  3. Here is Airflow's doc about the process adding a new provider

@gyli
Copy link
Author

gyli commented Jul 31, 2024

Also, I can take the implementation of the operator, while I would like to put it on hold until there is some progress of the discussion with Airflow team. Can we bring some more attention and discuss it within XTable maintainers as the first step?

@vinothchandar
Copy link
Member

I've tried proposing this with Airflow directly, while it requires votes to add it as a provider.

Thanks for bringing this up, @gyli . it's a great idea to have the conversion run at the end of airflow DAGs.

Happy to help this make progress. Do you have a dev list thread or a GH issue on Airflow, where you have brought this up with Airflow maintainers? If so, easiest would be to chime in there, understand what needs to be done/overall process.

@vinothchandar
Copy link
Member

@gyli
Copy link
Author

gyli commented Aug 1, 2024

The above doc is the correct process to add a new provide.

I have started a discussion here, but they need an official proposal and voting in Airflow devlist.

@vinothchandar
Copy link
Member

@gyli I was off. will get on this. next week. Thanks for your patience

@gyli
Copy link
Author

gyli commented Aug 16, 2024

Awesome. I was about to send out the email to their devlist, but it would be much better if you can send out. Thanks.

@vinishjail97 vinishjail97 added the enhancement New feature or request label Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants