NvTimLiu · nvauto · Apr 29, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -33,18 +33,18 @@ The signoff means you certify the below (from [developercertificate.org](https:/
 
 ```
 Developer Certificate of Origin
-Version 1.1
+Version 25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT
 
 Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
-1 Letterman Drive
+25.06.25.06.1-SNAPSHOT Letterman Drive
 Suite D4700
-San Francisco, CA, 94129
+San Francisco, CA, 9425.06.25.06.1-SNAPSHOT29
 
 Everyone is permitted to copy and distribute verbatim copies of this
 license document, but changing it is not allowed.
 
 
-Developer's Certificate of Origin 1.1
+Developer's Certificate of Origin 25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT
 
 By making a contribution to this project, I certify that:
 

diff --git a/LICENSE b/LICENSE
@@ -4,10 +4,10 @@
 
    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
 
-   1. Definitions.
+   25.06.25.06.1-SNAPSHOT. Definitions.
 
       "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
+      and distribution as defined by Sections 25.06.25.06.1-SNAPSHOT through 9 of this document.
 
       "Licensor" shall mean the copyright owner or entity authorized by
       the copyright owner that is granting the License.
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright 2018 NVIDIA Corporation
+   Copyright 2025.06.25.06.1-SNAPSHOT8 NVIDIA Corporation
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ You can download the latest version of RAPIDS Accelerator [here](https://nvidia.
 This repo contains examples and applications that showcases the performance and benefits of using 
 RAPIDS Accelerator in data processing and machine learning pipelines. 
 There are broadly five categories of examples in this repo: 
-1. [SQL/Dataframe](./examples/SQL+DF-Examples) 
+25.06.25.06.1-SNAPSHOT. [SQL/Dataframe](./examples/SQL+DF-Examples) 
 2. [Spark XGBoost](./examples/XGBoost-Examples) 
 3. [Machine Learning/Deep Learning](./examples/ML+DL-Examples) 
 4. [RAPIDS UDF](./examples/UDF-Examples)
@@ -18,22 +18,22 @@ Here is the list of notebooks in this repo:
 
 |   | Category  | Notebook Name | Description
 | ------------- | ------------- | ------------- | -------------
-| 1 | SQL/DF | Microbenchmark | Spark SQL operations such as expand, hash aggregate, windowing, and cross joins with up to 20x performance benefits
+| 25.06.25.06.1-SNAPSHOT | SQL/DF | Microbenchmark | Spark SQL operations such as expand, hash aggregate, windowing, and cross joins with up to 20x performance benefits
 | 2 | SQL/DF | Customer Churn | Data federation for modeling customer Churn with a sample telco customer data
 | 3 | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom)
 | 4 | XGBoost | Mortgage (Scala) | End-to-end ETL + XGBoost example to predict mortgage default with [Fannie Mae Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data)
-| 5 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
+| 5 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
 | 6 | ML/DL | PCA | [Spark-Rapids-ML](https://github.com/NVIDIA/spark-rapids-ml) based PCA example to train and transform with a synthetic dataset
-| 7 | ML/DL | DL Inference | 11 notebooks demonstrating distributed model inference on Spark using the `predict_batch_udf` across various frameworks: PyTorch, HuggingFace, and TensorFlow
+| 7 | ML/DL | DL Inference | 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT notebooks demonstrating distributed model inference on Spark using the `predict_batch_udf` across various frameworks: PyTorch, HuggingFace, and TensorFlow
 
 Here is the list of Apache Spark applications (Scala and PySpark) that 
 can be built for running on GPU with RAPIDS Accelerator in this repo:
 
 |   | Category  | Notebook Name | Description
 | ------------- | ------------- | ------------- | -------------
-| 1 | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom)
+| 25.06.25.06.1-SNAPSHOT | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom)
 | 2 | XGBoost | Mortgage (Scala) | End-to-end ETL + XGBoost example to predict mortgage default with [Fannie Mae Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data)
-| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
+| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
 | 4 | ML/DL | PCA | [Spark-Rapids-ML](https://github.com/NVIDIA/spark-rapids-ml) based PCA example to train and transform with a synthetic dataset
 | 5 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
 | 6 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)

diff --git a/datasets/agaricus-small.tar.gz b/datasets/agaricus-small.tar.gz
diff --git a/datasets/criteo-small.tar.gz b/datasets/criteo-small.tar.gz
diff --git a/datasets/cuspatial_data.tar.gz b/datasets/cuspatial_data.tar.gz
diff --git a/datasets/customer-churn.tar.gz b/datasets/customer-churn.tar.gz
diff --git a/datasets/taxi-small.tar.gz b/datasets/taxi-small.tar.gz
diff --git a/datasets/tpcds-small.tar.gz b/datasets/tpcds-small.tar.gz
diff --git a/dockerfile/Dockerfile b/dockerfile/Dockerfile
@@ -1,4 +1,4 @@
-# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2023, NVIDIA CORPORATION. All rights reserved.
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -15,13 +15,13 @@
 # limitations under the License.
 #
 
-FROM nvidia/cuda:11.8.0-devel-ubuntu18.04
-ARG spark_uid=185
+FROM nvidia/cuda:25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0-devel-ubuntu25.06.25.06.1-SNAPSHOT8.04
+ARG spark_uid=25.06.25.06.1-SNAPSHOT85
 
 # Install java dependencies 
 RUN apt-get update && apt-get install -y --no-install-recommends openjdk-8-jdk openjdk-8-jre
-ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
-ENV PATH $PATH:/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin
+ENV JAVA_HOME /usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64
+ENV PATH $PATH:/usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64/bin
 
 # Before building the docker image, first build and make a Spark distribution following
 # the instructions in http://spark.apache.org/docs/latest/building-spark.html.
@@ -43,7 +43,7 @@ RUN set -ex && \
 
 ENV DEBIAN_FRONTEND noninteractive
 RUN apt-get update && apt-get install -y --no-install-recommends apt-utils \
- && apt-get install -y --no-install-recommends python libgomp1 \
+ && apt-get install -y --no-install-recommends python libgomp25.06.25.06.1-SNAPSHOT \
  && rm -rf /var/lib/apt/lists/*
 
 COPY jars /opt/spark/jars
@@ -59,7 +59,7 @@ ENV SPARK_HOME /opt/spark
 WORKDIR /opt/spark/work-dir
 RUN chmod g+w /opt/spark/work-dir
 
-ENV TINI_VERSION v0.18.0
+ENV TINI_VERSION v0.25.06.25.06.1-SNAPSHOT8.0
 ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /sbin/tini
 RUN chmod +rx /sbin/tini
 

diff --git a/dockerfile/gpu_executor_template.yaml b/dockerfile/gpu_executor_template.yaml
@@ -1,4 +1,4 @@
-# Copyright (c) 2024, NVIDIA CORPORATION.
+# Copyright (c) 2024-2025, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -12,12 +12,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-apiVersion: v1
+apiVersion: v25.06.25.06.1-SNAPSHOT
 kind: Pod
 spec:
   containers:
     - name: executor
       resources:
         limits:
-          nvidia.com/gpu: 1
+          nvidia.com/gpu: 25.06.25.06.1-SNAPSHOT
 
diff --git a/docs/get-started/xgboost-examples/csp/aws/ec2.md b/docs/get-started/xgboost-examples/csp/aws/ec2.md
@@ -8,29 +8,29 @@ For more details of AWS EC2 and get started, please check the [AWS document](htt
 
 Go to AWS Management Console select a region, e.g. Oregon, and click EC2 service.
 
-### Step 1:  Launch New Instance
+### Step 25.06.25.06.1-SNAPSHOT:  Launch New Instance
 
 Click "Launch instance" at the EC2 Management Console, and select "Launch instance".
 
-![Step 1:  Launch New Instance](pics/ec2_step1.png)
+![Step 25.06.25.06.1-SNAPSHOT:  Launch New Instance](pics/ec2_step25.06.25.06.1-SNAPSHOT.png)
 
 ### Step 2:  Configure Instance
 
-#### Step 2.1: Choose an Amazon Machine Image(AMI)
+#### Step 2.25.06.25.06.1-SNAPSHOT: Choose an Amazon Machine Image(AMI)
 
-Search for "deep learning base ami", choose "Deep Learning Base AMI (Ubuntu 18.04)". Click "Select".
+Search for "deep learning base ami", choose "Deep Learning Base AMI (Ubuntu 25.06.25.06.1-SNAPSHOT8.04)". Click "Select".
 
-![Step 2.1: Choose an Amazon Machine Image(AMI)](pics/ec2_step2-1.png)
+![Step 2.25.06.25.06.1-SNAPSHOT: Choose an Amazon Machine Image(AMI)](pics/ec2_step2-25.06.25.06.1-SNAPSHOT.png)
 
 #### Step 2.2: Choose an Instance Type
 
 Choose type "p3.2xlarge". Click "Next: Configure Instance Details" at right buttom.
 
-![Step 2.1: Choose an Instance Type](pics/ec2_step2-2.png)
+![Step 2.25.06.25.06.1-SNAPSHOT: Choose an Instance Type](pics/ec2_step2-2.png)
 
 #### Step 2.3: Configure Instance Detials
 
-Do not need to change anything here, make sure "Number of instances" is 1. Click "Next: Add Storage" at right buttom.
+Do not need to change anything here, make sure "Number of instances" is 25.06.25.06.1-SNAPSHOT. Click "Next: Add Storage" at right buttom.
 
 ![Step 2.3: Configure Instance Detials](pics/ec2_step2-3.png)
 
@@ -66,7 +66,7 @@ Return "instances | EC2 Managemnt Console", you can find your instance running.
 
 ## Launch EC2 and Configure Spark 3.2+
 
-### Step 1:  Launch EC2
+### Step 25.06.25.06.1-SNAPSHOT:  Launch EC2
 
 Copy "Public DNS (IPv4)" of your instance 
 Use ssh with your private key to launch the EC2 machine as user "ubuntu"
@@ -81,9 +81,9 @@ Download spark package and set environment variable.
 
 ``` bash
 # download the spark
-wget https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
-tar zxf spark-3.2.1-bin-hadoop3.2.tgz
-export SPARK_HOME=/your/spark/spark-3.2.1-bin-hadoop3.2
+wget https://dlcdn.apache.org/spark/spark-3.2.25.06.25.06.1-SNAPSHOT/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2.tgz
+tar zxf spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2.tgz
+export SPARK_HOME=/your/spark/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2
 ```
 
 ### Step 3: Download jars for S3A (optional)
@@ -93,25 +93,25 @@ The jars should under $SPARK_HOME/jars
 
 ``` bash
 cd $SPARK_HOME/jars
-wget https://github.com/JodaOrg/joda-time/releases/download/v2.10.5/joda-time-2.10.5.jar
-wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
-wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.11.687/aws-java-sdk-1.11.687.jar
-wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.11.687/aws-java-sdk-core-1.11.687.jar
-wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/1.11.687/aws-java-sdk-dynamodb-1.11.687.jar
-wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.11.687/aws-java-sdk-s3-1.11.687.jar
+wget https://github.com/JodaOrg/joda-time/releases/download/v2.25.06.25.06.1-SNAPSHOT0.5/joda-time-2.25.06.25.06.1-SNAPSHOT0.5.jar
+wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
+wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar
+wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-core/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-core-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar
+wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-dynamodb-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar
+wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-s3-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar
 ```
 
 ### Step 4: Start Spark Standalone
 
-#### Step 4.1: Edit spark-default.conf
+#### Step 4.25.06.25.06.1-SNAPSHOT: Edit spark-default.conf
 
 cd $SPARK_HOME/conf and edit spark-defaults.conf
 
 By default, thers is only spark-defaults.conf.template in $SPARK_HOME/conf, you could edit it and rename to spark-defaults.conf
 You can find getGpusResources.sh in $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh
 
 ``` bash
-spark.worker.resource.gpu.amount 1
+spark.worker.resource.gpu.amount 25.06.25.06.1-SNAPSHOT
 spark.worker.resource.gpu.discoveryScript /path/to/getGpusResources.sh
 ```
 
@@ -128,7 +128,7 @@ $SPARK_HOME/sbin/start-slave.sh <master-spark-URL>
 
 ## Launch XGBoost-Spark examples on Spark 3.2+
 
-### Step 1: Download Jars
+### Step 25.06.25.06.1-SNAPSHOT: Download Jars
 
 Make sure you have prepared the necessary packages and dataset by following this [guide](/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md)
 
@@ -144,12 +144,12 @@ Create running run.sh script with below content, make sure change the paths in i
 
 ``` bash
 #!/bin/bash
-export SPARK_HOME=/your/path/to/spark-3.2.1-bin-hadoop3.2
+export SPARK_HOME=/your/path/to/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2
 
 export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
 
 export TOTAL_CORES=8
-export NUM_EXECUTORS=1
+export NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT
 export NUM_EXECUTOR_CORES=$((${TOTAL_CORES}/${NUM_EXECUTORS}))
 
 export S3A_CREDS_USR=your_aws_key
@@ -158,7 +158,7 @@ export S3A_CREDS_PSW=your_aws_secret
 
 spark-submit --master spark://$HOSTNAME:7077 \
         --deploy-mode client \
-        --driver-memory 10G \
+        --driver-memory 25.06.25.06.1-SNAPSHOT0G \
         --executor-memory 22G \
         --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
         --conf spark.hadoop.fs.s3a.access.key=$S3A_CREDS_USR \
@@ -168,18 +168,18 @@ spark-submit --master spark://$HOSTNAME:7077 \
         --conf spark.executor.cores=$NUM_EXECUTOR_CORES \
         --conf spark.task.cpus=$NUM_EXECUTOR_CORES \
         --conf spark.sql.files.maxPartitionBytes=4294967296 \
-        --conf spark.yarn.maxAppAttempts=1 \
+        --conf spark.yarn.maxAppAttempts=25.06.25.06.1-SNAPSHOT \
         --conf spark.plugins=com.nvidia.spark.SQLPlugin \
         --conf spark.rapids.memory.gpu.pooling.enabled=false \
-        --conf spark.executor.resource.gpu.amount=1 \
-        --conf spark.task.resource.gpu.amount=1 \
+        --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \
+        --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \
         --class com.nvidia.spark.examples.mortgage.GPUMain \
         ${SAMPLE_JAR} \
         -num_workers=${NUM_EXECUTORS} \
         -format=csv \
         -dataPath="train::your-train-data-path" \
         -dataPath="trans::your-eval-data-path" \
-        -numRound=100 -max_depth=8 -nthread=$NUM_EXECUTOR_CORES -showFeatures=0 \
+        -numRound=25.06.25.06.1-SNAPSHOT00 -max_depth=8 -nthread=$NUM_EXECUTOR_CORES -showFeatures=0 \
         -tree_method=gpu_hist
 ```
 

diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png
diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png