Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ION extra to Kestra Python Package #16

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

japerry911
Copy link
Contributor

@japerry911 japerry911 commented Feb 7, 2025

What changes are being made and why?

  • added Kestra ION package as extra to Kestra Python Package

  • kept the repo code 99% the same, just changed around file names so that from kestra_ion import read_ion worked as before - which makes it backward compatible

    • I am happy to change file structure, let me know thoughts
  • added to documentation and tests to cover the new ION extra update

  • we will probably want to update other docs that reference Kestra-ION repo, and not shut down that repo for time being, in case users do not see this new extra (I also won't probably have time to update to use new extra for a week or two)

  • closes Add kestra-ion as subpackage to kestra #11

  • closes Kestra Flow Triggering Attempts to Poll Execution before it's created #17


How the changes have been QAed?

pip install 'kestra[ion] @ git+https://github.com/japerry911/libs.git@japerry911/imp/add-kestra-ion-extra#subdirectory=python'
  • can confirm pip install kestra does not install the Kestra ION package

  • main.py file -

import pandas as pd
import requests
from kestra_ion import read_ion

file_path = "employees.ion"
url = "https://huggingface.co/datasets/kestra/datasets/resolve/main/ion/employees.ion"
response = requests.get(url)
if response.status_code == 200:
    with open(file_path, "wb") as file:
        file.write(response.content)
else:
    print(f"Failed to download the file. Status code: {response.status_code}")


data = read_ion(file_path)
df = pd.DataFrame(data)
print(df.info())
python main.py
(.venv) jackperry@FDMACDE-JPER test_kestra_ion % python main.py    
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   employee_id     100 non-null    float64       
 1   first_name      100 non-null    object        
 2   last_name       100 non-null    object        
 3   email           100 non-null    object        
 4   phone_number    100 non-null    object        
 5   hire_date       100 non-null    datetime64[ns]
 6   job_id          100 non-null    object        
 7   salary          100 non-null    float64       
 8   commission_pct  35 non-null     float64       
 9   manager_id      99 non-null     float64       
 10  department_id   99 non-null     float64       
dtypes: datetime64[ns](1), float64(5), object(5)
memory usage: 8.7+ KB
None

@tchiotludo
Copy link
Member

Nice one, really appreciate that you handle java time nativelly !

Some quick idea:

  • with not some Kestra.read(path) to remove the open? in the context of Kestra, will inject the file and all users will need to open + read_ion, will be even more easier to use.
  • with not some Kestra.write(data) that magically read a python object and send it as output for Kestra?

with both, I think we will remove all the complexity to understand how kestra is working.

WDYT?

@japerry911
Copy link
Contributor Author

Nice one, really appreciate that you handle java time nativelly !

Some quick idea:

  • with not some Kestra.read(path) to remove the open? in the context of Kestra, will inject the file and all users will need to open + read_ion, will be even more easier to use.
  • with not some Kestra.write(data) that magically read a python object and send it as output for Kestra?

with both, I think we will remove all the complexity to understand how kestra is working.

WDYT?

Hello @tchiotludo !

Kestra "read" is not a function currently, are you proposing to add that to Python Kestra? open is native Python function, so that made it super easy to do that.

For write, are you referencing that it reads the ION and sets as a Kestra output? I think current set up works well if we are just using the ION read file in the Python code.

Could you elaborate on your propositions, I feel like I lack understanding 😅 ?

Thank you 😸 !

@tchiotludo
Copy link
Member

For the read, I would just change the read_ion to be Kestra.read to have a simple common static method for all helpers inside python, it will make this method more visible for newcomers I think.

For the write, I would love to see Kestra.write(vars) with vars that could be any dict in python and that would write as ion and automatically capture by Kestra as outputFiles, but It will need a change on the core as I see, so let's postpone for now

@japerry911
Copy link
Contributor Author

@tchiotludo , ohhhhhhhhhhhhhhh I am following now. Sorry, I copied this package over from Anna's repo, and was trying to make it backwards compatible in this manner

from kestra_ion import read_ion

Is it ok if I don't have it import like that? Because if so, I can play around with your thoughts - they make more sense now 😄

--

For now, I will move read_ion to Kestra.read and can investigate write

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To review
Development

Successfully merging this pull request may close these issues.

Kestra Flow Triggering Attempts to Poll Execution before it's created Add kestra-ion as subpackage to kestra
2 participants