Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some python and notebook versions of examples have diverged #357

Open
eordentlich opened this issue Feb 2, 2024 · 2 comments
Open

Some python and notebook versions of examples have diverged #357

eordentlich opened this issue Feb 2, 2024 · 2 comments
Assignees

Comments

@eordentlich
Copy link
Collaborator

Describe the bug
Not sure it is the case for all examples, but for the mortgage ETL + XGBoost example there are some non-trivial discrepancies. Example:
python script has udfs: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py#L22-L23
while the notebook(s) implement these using Spark SQL directly:
https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb?short_path=2af22cf#L454-L478
There are some other differences. Looks like the scripts may be lagging the notebooks.

Steps/Code to reproduce bug
N/A

Expected behavior
Notebooks and python script versions should ideally be aligned (or at least documented why they don't).

Environment details (please complete the following information)
N/A

@GaryShen2008
Copy link
Collaborator

@nvliyuan Do you remember who wrote these examples? I can't recall the reason, but there should be.

@nvliyuan
Copy link
Collaborator

Yes, the same example with different implementations should keep the same logic, will draft a pr to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants