Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the main branch for 2502 release #504

Merged
merged 15 commits into from
Feb 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions .github/workflows/add-to-project.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
# Copyright (c) 2024-2025, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -23,13 +23,11 @@ on:
- opened

jobs:
add-to-project:
if: github.repository == 'NVIDIA/spark-rapids-examples'
name: Add new issues and pull requests to project
Add-to-project:
if: github.repository_owner == 'NVIDIA' # avoid adding issues from forks
runs-on: ubuntu-latest
steps:
- uses: actions/[email protected]
- name: add-to-project
uses: NVIDIA/spark-rapids-common/add-to-project@main
with:
project-url: https://github.com/orgs/NVIDIA/projects/4
github-token: ${{ secrets.PROJECT_TOKEN }}

token: ${{ secrets.PROJECT_TOKEN }}
54 changes: 54 additions & 0 deletions .github/workflows/license-header-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A workflow to check copyright/license header
name: license header check

on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
license-header-check:
runs-on: ubuntu-latest
if: "!contains(github.event.pull_request.title, '[bot]')"
steps:
- name: Get checkout depth
run: |
echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 10 ))" >> $GITHUB_ENV

- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: ${{ env.PR_FETCH_DEPTH }}

- name: license-header-check
uses: NVIDIA/spark-rapids-common/license-header-check@main
with:
included_file_patterns: |
*.sh,
*.java,
*.py,
*.pbtxt,
*Dockerfile*,
*Jenkinsfile*,
*.yml,
*.yaml,
*.cpp,
*.hpp,
*.txt,
*.cu,
*.scala,
*.ini,
*.xml
14 changes: 14 additions & 0 deletions dockerfile/gpu_executor_template.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Pod
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
create an `init.sh` scripts with contents:
```bash
#!/bin/bash
sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-25.02.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.02.1/rapids-4-spark_2.12-25.02.1.jar
```
1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
Prerequisites section.
Expand Down Expand Up @@ -68,7 +68,7 @@ create an `init.sh` scripts with contents:
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-24.12.0.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-25.02.1.jar:/databricks/spark/python
```
Note that since python memory pool require installing the cudf library, so you need to install cudf library in
each worker nodes `pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com` or disable python memory pool
Expand Down
16 changes: 15 additions & 1 deletion docs/get-started/xgboost-examples/csp/databricks/init.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar

sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-25.02.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.02.1/rapids-4-spark_2.12-25.02.1.jar
sudo wget -O /databricks/jars/xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar
sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar
ls -ltr
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
export SPARK_DOCKER_TAG=<spark docker image tag>

pushd ${SPARK_HOME}
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-24.12/dockerfile/Dockerfile
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-25.02/dockerfile/Dockerfile

# Optionally install additional jars into ${SPARK_HOME}/jars/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.02.1/rapids-4-spark_2.12-25.02.1.jar)

### Build XGBoost Python Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.02.1/rapids-4-spark_2.12-25.02.1.jar)

### Build XGBoost Scala Examples

Expand Down
4 changes: 2 additions & 2 deletions examples/ML+DL-Examples/Optuna-Spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,8 @@ We use [RAPIDS](https://docs.rapids.ai/install/#get-rapids) for GPU-accelerated
``` shell
sudo apt install libmysqlclient-dev

conda create -n rapids-24.12 -c rapidsai -c conda-forge -c nvidia \
cudf=24.12 cuml=24.12 python=3.10 'cuda-version>=12.0,<=12.5'
conda create -n rapids-25.02 -c rapidsai -c conda-forge -c nvidia \
cudf=25.02 cuml=25.02 python=3.10 'cuda-version>=12.0,<=12.5'
conda activate optuna-spark
pip install mysqlclient
pip install optuna joblib joblibspark ipywidgets
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ fi


# rapids import
SPARK_RAPIDS_VERSION=24.12.0
SPARK_RAPIDS_VERSION=25.02.1
curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar -o \
/databricks/jars/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar

Expand All @@ -54,7 +54,7 @@ ln -s /usr/local/cuda-11.8 /usr/local/cuda

sudo /databricks/python3/bin/pip3 install \
--extra-index-url=https://pypi.nvidia.com \
"cudf-cu11==24.12.*" "cuml-cu11==24.12.*"
"cudf-cu11==25.02.*" "cuml-cu11==25.02.*"

# setup python environment
sudo apt clean && sudo apt update --fix-missing -y
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ json_config=$(cat <<EOF
"spark_version": "13.3.x-gpu-ml-scala2.12",
"spark_conf": {
"spark.task.resource.gpu.amount": "1",
"spark.executorEnv.PYTHONPATH": "/databricks/jars/rapids-4-spark_2.12-24.12.0.jar:/databricks/spark/python:/databricks/python3",
"spark.executorEnv.PYTHONPATH": "/databricks/jars/rapids-4-spark_2.12-25.02.1.jar:/databricks/spark/python:/databricks/python3",
"spark.executor.cores": "8",
"spark.rapids.memory.gpu.minAllocFraction": "0.0001",
"spark.plugins": "com.nvidia.spark.SQLPlugin",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -444,14 +444,14 @@
"24/12/11 23:47:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n",
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
"24/12/11 23:47:52 WARN RapidsPluginUtils: RAPIDS Accelerator 24.12.0 using cudf 24.12.0, private revision bd4e99e18e20234ee0c54f95f4b0bfce18a6255e\n",
"24/12/11 23:47:52 WARN RapidsPluginUtils: RAPIDS Accelerator 25.02.1 using cudf 25.02.1, private revision bd4e99e18e20234ee0c54f95f4b0bfce18a6255e\n",
"24/12/11 23:47:52 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n"
]
}
],
"source": [
"def get_rapids_jar():\n",
" SPARK_RAPIDS_VERSION = \"24.12.0\"\n",
" SPARK_RAPIDS_VERSION = \"25.02.1\"\n",
" rapids_jar = f\"rapids-4-spark_2.12-{SPARK_RAPIDS_VERSION}.jar\"\n",
" if not os.path.exists(rapids_jar):\n",
" print(\"Downloading Spark Rapids jar\")\n",
Expand Down
Loading