diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6b6e8194e..e6fc93810 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -33,18 +33,18 @@ The signoff means you certify the below (from [developercertificate.org](https:/ ``` Developer Certificate of Origin -Version 1.1 +Version 25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT Copyright (C) 2004, 2006 The Linux Foundation and its contributors. -1 Letterman Drive +25.06.25.06.1-SNAPSHOT Letterman Drive Suite D4700 -San Francisco, CA, 94129 +San Francisco, CA, 9425.06.25.06.1-SNAPSHOT29 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. -Developer's Certificate of Origin 1.1 +Developer's Certificate of Origin 25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT By making a contribution to this project, I certify that: diff --git a/LICENSE b/LICENSE index 18bcb4316..50146d3e5 100644 --- a/LICENSE +++ b/LICENSE @@ -4,10 +4,10 @@ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - 1. Definitions. + 25.06.25.06.1-SNAPSHOT. Definitions. "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. + and distribution as defined by Sections 25.06.25.06.1-SNAPSHOT through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. @@ -186,7 +186,7 @@ same "printed page" as the copyright notice for easier identification within third-party archives. - Copyright 2018 NVIDIA Corporation + Copyright 2025.06.25.06.1-SNAPSHOT8 NVIDIA Corporation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. diff --git a/README.md b/README.md index 8e439ae5b..8771b6f44 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ You can download the latest version of RAPIDS Accelerator [here](https://nvidia. This repo contains examples and applications that showcases the performance and benefits of using RAPIDS Accelerator in data processing and machine learning pipelines. There are broadly five categories of examples in this repo: -1. [SQL/Dataframe](./examples/SQL+DF-Examples) +25.06.25.06.1-SNAPSHOT. [SQL/Dataframe](./examples/SQL+DF-Examples) 2. [Spark XGBoost](./examples/XGBoost-Examples) 3. [Machine Learning/Deep Learning](./examples/ML+DL-Examples) 4. [RAPIDS UDF](./examples/UDF-Examples) @@ -18,22 +18,22 @@ Here is the list of notebooks in this repo: | | Category | Notebook Name | Description | ------------- | ------------- | ------------- | ------------- -| 1 | SQL/DF | Microbenchmark | Spark SQL operations such as expand, hash aggregate, windowing, and cross joins with up to 20x performance benefits +| 25.06.25.06.1-SNAPSHOT | SQL/DF | Microbenchmark | Spark SQL operations such as expand, hash aggregate, windowing, and cross joins with up to 20x performance benefits | 2 | SQL/DF | Customer Churn | Data federation for modeling customer Churn with a sample telco customer data | 3 | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom) | 4 | XGBoost | Mortgage (Scala) | End-to-end ETL + XGBoost example to predict mortgage default with [Fannie Mae Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) -| 5 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) +| 5 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page) | 6 | ML/DL | PCA | [Spark-Rapids-ML](https://github.com/NVIDIA/spark-rapids-ml) based PCA example to train and transform with a synthetic dataset -| 7 | ML/DL | DL Inference | 11 notebooks demonstrating distributed model inference on Spark using the `predict_batch_udf` across various frameworks: PyTorch, HuggingFace, and TensorFlow +| 7 | ML/DL | DL Inference | 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT notebooks demonstrating distributed model inference on Spark using the `predict_batch_udf` across various frameworks: PyTorch, HuggingFace, and TensorFlow Here is the list of Apache Spark applications (Scala and PySpark) that can be built for running on GPU with RAPIDS Accelerator in this repo: | | Category | Notebook Name | Description | ------------- | ------------- | ------------- | ------------- -| 1 | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom) +| 25.06.25.06.1-SNAPSHOT | XGBoost | Agaricus (Scala) | Uses XGBoost classifier function to create model that can accurately differentiate between edible and poisonous mushrooms with the [agaricus dataset](https://archive.ics.uci.edu/ml/datasets/mushroom) | 2 | XGBoost | Mortgage (Scala) | End-to-end ETL + XGBoost example to predict mortgage default with [Fannie Mae Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) -| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) +| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page) | 4 | ML/DL | PCA | [Spark-Rapids-ML](https://github.com/NVIDIA/spark-rapids-ml) based PCA example to train and transform with a synthetic dataset | 5 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/) | 6 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/) diff --git a/datasets/agaricus-small.tar.gz b/datasets/agaricus-small.tar.gz index 3e022280a..6f4ae7418 100644 Binary files a/datasets/agaricus-small.tar.gz and b/datasets/agaricus-small.tar.gz differ diff --git a/datasets/criteo-small.tar.gz b/datasets/criteo-small.tar.gz index 045aaaeb0..66af674e3 100644 Binary files a/datasets/criteo-small.tar.gz and b/datasets/criteo-small.tar.gz differ diff --git a/datasets/cuspatial_data.tar.gz b/datasets/cuspatial_data.tar.gz index 900af3e03..b1445a0cb 100644 Binary files a/datasets/cuspatial_data.tar.gz and b/datasets/cuspatial_data.tar.gz differ diff --git a/datasets/customer-churn.tar.gz b/datasets/customer-churn.tar.gz index 52f5ed77e..2d9111dce 100644 Binary files a/datasets/customer-churn.tar.gz and b/datasets/customer-churn.tar.gz differ diff --git a/datasets/taxi-small.tar.gz b/datasets/taxi-small.tar.gz index 4053a169b..8998d64f9 100644 Binary files a/datasets/taxi-small.tar.gz and b/datasets/taxi-small.tar.gz differ diff --git a/datasets/tpcds-small.tar.gz b/datasets/tpcds-small.tar.gz index c91120562..58d02f8c8 100644 Binary files a/datasets/tpcds-small.tar.gz and b/datasets/tpcds-small.tar.gz differ diff --git a/dockerfile/Dockerfile b/dockerfile/Dockerfile index 384788d7d..cc394e63c 100644 --- a/dockerfile/Dockerfile +++ b/dockerfile/Dockerfile @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2023, NVIDIA CORPORATION. All rights reserved. # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. @@ -15,13 +15,13 @@ # limitations under the License. # -FROM nvidia/cuda:11.8.0-devel-ubuntu18.04 -ARG spark_uid=185 +FROM nvidia/cuda:25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0-devel-ubuntu25.06.25.06.1-SNAPSHOT8.04 +ARG spark_uid=25.06.25.06.1-SNAPSHOT85 # Install java dependencies RUN apt-get update && apt-get install -y --no-install-recommends openjdk-8-jdk openjdk-8-jre -ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64 -ENV PATH $PATH:/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin +ENV JAVA_HOME /usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64 +ENV PATH $PATH:/usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-25.06.25.06.1-SNAPSHOT.8.0-openjdk-amd64/bin # Before building the docker image, first build and make a Spark distribution following # the instructions in http://spark.apache.org/docs/latest/building-spark.html. @@ -43,7 +43,7 @@ RUN set -ex && \ ENV DEBIAN_FRONTEND noninteractive RUN apt-get update && apt-get install -y --no-install-recommends apt-utils \ - && apt-get install -y --no-install-recommends python libgomp1 \ + && apt-get install -y --no-install-recommends python libgomp25.06.25.06.1-SNAPSHOT \ && rm -rf /var/lib/apt/lists/* COPY jars /opt/spark/jars @@ -59,7 +59,7 @@ ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir RUN chmod g+w /opt/spark/work-dir -ENV TINI_VERSION v0.18.0 +ENV TINI_VERSION v0.25.06.25.06.1-SNAPSHOT8.0 ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /sbin/tini RUN chmod +rx /sbin/tini diff --git a/dockerfile/gpu_executor_template.yaml b/dockerfile/gpu_executor_template.yaml index 6784e590e..3d2665d19 100644 --- a/dockerfile/gpu_executor_template.yaml +++ b/dockerfile/gpu_executor_template.yaml @@ -1,4 +1,4 @@ -# Copyright (c) 2024, NVIDIA CORPORATION. +# Copyright (c) 2024-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,12 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -apiVersion: v1 +apiVersion: v25.06.25.06.1-SNAPSHOT kind: Pod spec: containers: - name: executor resources: limits: - nvidia.com/gpu: 1 + nvidia.com/gpu: 25.06.25.06.1-SNAPSHOT diff --git a/docs/get-started/xgboost-examples/csp/aws/ec2.md b/docs/get-started/xgboost-examples/csp/aws/ec2.md index 0565ce601..ec177a24d 100644 --- a/docs/get-started/xgboost-examples/csp/aws/ec2.md +++ b/docs/get-started/xgboost-examples/csp/aws/ec2.md @@ -8,29 +8,29 @@ For more details of AWS EC2 and get started, please check the [AWS document](htt Go to AWS Management Console select a region, e.g. Oregon, and click EC2 service. -### Step 1: Launch New Instance +### Step 25.06.25.06.1-SNAPSHOT: Launch New Instance Click "Launch instance" at the EC2 Management Console, and select "Launch instance". -![Step 1: Launch New Instance](pics/ec2_step1.png) +![Step 25.06.25.06.1-SNAPSHOT: Launch New Instance](pics/ec2_step25.06.25.06.1-SNAPSHOT.png) ### Step 2: Configure Instance -#### Step 2.1: Choose an Amazon Machine Image(AMI) +#### Step 2.25.06.25.06.1-SNAPSHOT: Choose an Amazon Machine Image(AMI) -Search for "deep learning base ami", choose "Deep Learning Base AMI (Ubuntu 18.04)". Click "Select". +Search for "deep learning base ami", choose "Deep Learning Base AMI (Ubuntu 25.06.25.06.1-SNAPSHOT8.04)". Click "Select". -![Step 2.1: Choose an Amazon Machine Image(AMI)](pics/ec2_step2-1.png) +![Step 2.25.06.25.06.1-SNAPSHOT: Choose an Amazon Machine Image(AMI)](pics/ec2_step2-25.06.25.06.1-SNAPSHOT.png) #### Step 2.2: Choose an Instance Type Choose type "p3.2xlarge". Click "Next: Configure Instance Details" at right buttom. -![Step 2.1: Choose an Instance Type](pics/ec2_step2-2.png) +![Step 2.25.06.25.06.1-SNAPSHOT: Choose an Instance Type](pics/ec2_step2-2.png) #### Step 2.3: Configure Instance Detials -Do not need to change anything here, make sure "Number of instances" is 1. Click "Next: Add Storage" at right buttom. +Do not need to change anything here, make sure "Number of instances" is 25.06.25.06.1-SNAPSHOT. Click "Next: Add Storage" at right buttom. ![Step 2.3: Configure Instance Detials](pics/ec2_step2-3.png) @@ -66,7 +66,7 @@ Return "instances | EC2 Managemnt Console", you can find your instance running. ## Launch EC2 and Configure Spark 3.2+ -### Step 1: Launch EC2 +### Step 25.06.25.06.1-SNAPSHOT: Launch EC2 Copy "Public DNS (IPv4)" of your instance Use ssh with your private key to launch the EC2 machine as user "ubuntu" @@ -81,9 +81,9 @@ Download spark package and set environment variable. ``` bash # download the spark -wget https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz -tar zxf spark-3.2.1-bin-hadoop3.2.tgz -export SPARK_HOME=/your/spark/spark-3.2.1-bin-hadoop3.2 +wget https://dlcdn.apache.org/spark/spark-3.2.25.06.25.06.1-SNAPSHOT/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2.tgz +tar zxf spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2.tgz +export SPARK_HOME=/your/spark/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2 ``` ### Step 3: Download jars for S3A (optional) @@ -93,17 +93,17 @@ The jars should under $SPARK_HOME/jars ``` bash cd $SPARK_HOME/jars -wget https://github.com/JodaOrg/joda-time/releases/download/v2.10.5/joda-time-2.10.5.jar -wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar -wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.11.687/aws-java-sdk-1.11.687.jar -wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.11.687/aws-java-sdk-core-1.11.687.jar -wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/1.11.687/aws-java-sdk-dynamodb-1.11.687.jar -wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.11.687/aws-java-sdk-s3-1.11.687.jar +wget https://github.com/JodaOrg/joda-time/releases/download/v2.25.06.25.06.1-SNAPSHOT0.5/joda-time-2.25.06.25.06.1-SNAPSHOT0.5.jar +wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar +wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar +wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-core/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-core-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar +wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-dynamodb-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar +wget https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687/aws-java-sdk-s3-25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.687.jar ``` ### Step 4: Start Spark Standalone -#### Step 4.1: Edit spark-default.conf +#### Step 4.25.06.25.06.1-SNAPSHOT: Edit spark-default.conf cd $SPARK_HOME/conf and edit spark-defaults.conf @@ -111,7 +111,7 @@ By default, thers is only spark-defaults.conf.template in $SPARK_HOME/conf, you You can find getGpusResources.sh in $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh ``` bash -spark.worker.resource.gpu.amount 1 +spark.worker.resource.gpu.amount 25.06.25.06.1-SNAPSHOT spark.worker.resource.gpu.discoveryScript /path/to/getGpusResources.sh ``` @@ -128,7 +128,7 @@ $SPARK_HOME/sbin/start-slave.sh ## Launch XGBoost-Spark examples on Spark 3.2+ -### Step 1: Download Jars +### Step 25.06.25.06.1-SNAPSHOT: Download Jars Make sure you have prepared the necessary packages and dataset by following this [guide](/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md) @@ -144,12 +144,12 @@ Create running run.sh script with below content, make sure change the paths in i ``` bash #!/bin/bash -export SPARK_HOME=/your/path/to/spark-3.2.1-bin-hadoop3.2 +export SPARK_HOME=/your/path/to/spark-3.2.25.06.25.06.1-SNAPSHOT-bin-hadoop3.2 export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH export TOTAL_CORES=8 -export NUM_EXECUTORS=1 +export NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT export NUM_EXECUTOR_CORES=$((${TOTAL_CORES}/${NUM_EXECUTORS})) export S3A_CREDS_USR=your_aws_key @@ -158,7 +158,7 @@ export S3A_CREDS_PSW=your_aws_secret spark-submit --master spark://$HOSTNAME:7077 \ --deploy-mode client \ - --driver-memory 10G \ + --driver-memory 25.06.25.06.1-SNAPSHOT0G \ --executor-memory 22G \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.access.key=$S3A_CREDS_USR \ @@ -168,18 +168,18 @@ spark-submit --master spark://$HOSTNAME:7077 \ --conf spark.executor.cores=$NUM_EXECUTOR_CORES \ --conf spark.task.cpus=$NUM_EXECUTOR_CORES \ --conf spark.sql.files.maxPartitionBytes=4294967296 \ - --conf spark.yarn.maxAppAttempts=1 \ + --conf spark.yarn.maxAppAttempts=25.06.25.06.1-SNAPSHOT \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pooling.enabled=false \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --class com.nvidia.spark.examples.mortgage.GPUMain \ ${SAMPLE_JAR} \ -num_workers=${NUM_EXECUTORS} \ -format=csv \ -dataPath="train::your-train-data-path" \ -dataPath="trans::your-eval-data-path" \ - -numRound=100 -max_depth=8 -nthread=$NUM_EXECUTOR_CORES -showFeatures=0 \ + -numRound=25.06.25.06.1-SNAPSHOT00 -max_depth=8 -nthread=$NUM_EXECUTOR_CORES -showFeatures=0 \ -tree_method=gpu_hist ``` diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png index 350e0385f..a5c7557d9 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step1.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png index f5dce9194..6a33ce5e3 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-1.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png index 6ffd96624..f19d1ef4c 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-2.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png index 6f6d52552..30d8eb2f5 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-3.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png index f2f2dff89..12b01b27e 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-4.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png index b80e181b8..1b2c64ea0 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-6.png differ diff --git a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png index 81562c2ca..a701cb2ea 100644 Binary files a/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png and b/docs/get-started/xgboost-examples/csp/aws/pics/ec2_step2-7.png differ diff --git a/docs/get-started/xgboost-examples/csp/databricks/databricks.md b/docs/get-started/xgboost-examples/csp/databricks/databricks.md index 7a34cdb56..69e71920c 100644 --- a/docs/get-started/xgboost-examples/csp/databricks/databricks.md +++ b/docs/get-started/xgboost-examples/csp/databricks/databricks.md @@ -6,11 +6,11 @@ This is a getting started guide to XGBoost4J-Spark on Databricks. At the end of Prerequisites ------------- - * Apache Spark 3.x running in Databricks Runtime 10.4 ML or 11.3 ML with GPU - * AWS: 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1) or 11.3 LTS ML (GPU, Scala 2.12, Spark 3.3.0) - * Azure: 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1) or 11.3 LTS ML (GPU, Scala 2.12, Spark 3.3.0) + * Apache Spark 3.x running in Databricks Runtime 25.06.25.06.1-SNAPSHOT0.4 ML or 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.3 ML with GPU + * AWS: 25.06.25.06.1-SNAPSHOT0.4 LTS ML (GPU, Scala 2.25.06.25.06.1-SNAPSHOT2, Spark 3.2.25.06.25.06.1-SNAPSHOT) or 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.3 LTS ML (GPU, Scala 2.25.06.25.06.1-SNAPSHOT2, Spark 3.3.0) + * Azure: 25.06.25.06.1-SNAPSHOT0.4 LTS ML (GPU, Scala 2.25.06.25.06.1-SNAPSHOT2, Spark 3.2.25.06.25.06.1-SNAPSHOT) or 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.3 LTS ML (GPU, Scala 2.25.06.25.06.1-SNAPSHOT2, Spark 3.3.0) -The number of GPUs per node dictates the number of Spark executors that can run in that node. Each executor should only be allowed to run 1 task at any given time. +The number of GPUs per node dictates the number of Spark executors that can run in that node. Each executor should only be allowed to run 25.06.25.06.1-SNAPSHOT task at any given time. Start A Databricks Cluster -------------------------- @@ -21,13 +21,13 @@ Navigate to your home directory in the UI and select **Create** > **File** from create an `init.sh` scripts with contents: ```bash #!/bin/bash - sudo wget -O /databricks/jars/rapids-4-spark_2.12-25.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar + sudo wget -O /databricks/jars/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar ``` -1. Select the Databricks Runtime Version from one of the supported runtimes specified in the +25.06.25.06.1-SNAPSHOT. Select the Databricks Runtime Version from one of the supported runtimes specified in the Prerequisites section. 2. Choose the number of workers that matches the number of GPUs you want to use. -3. Select a worker type. On AWS, use nodes with 1 GPU each such as `p3.2xlarge` or `g4dn.xlarge`. - For Azure, choose GPU nodes such as Standard_NC6s_v3. For GCP, choose N1 or A2 instance types with GPUs. +3. Select a worker type. On AWS, use nodes with 25.06.25.06.1-SNAPSHOT GPU each such as `p3.2xlarge` or `g4dn.xlarge`. + For Azure, choose GPU nodes such as Standard_NC6s_v3. For GCP, choose N25.06.25.06.1-SNAPSHOT or A2 instance types with GPUs. 4. Select the driver type. Generally this can be set to be the same as the worker. 5. Click the “Edit” button, then navigate down to the “Advanced Options” section. Select the “Init Scripts” tab in the advanced options section, and paste the workspace path to the initialization script:`/Users/user@domain/init.sh`, then click “Add”. @@ -39,16 +39,16 @@ create an `init.sh` scripts with contents: The [`spark.task.resource.gpu.amount`](https://spark.apache.org/docs/latest/configuration.html#scheduling) - configuration is defaulted to 1 by Databricks. That means that only 1 task can run on an - executor with 1 GPU, which is limiting, especially on the reads and writes from Parquet. Set - this to 1/(number of cores per executor) which will allow multiple tasks to run in parallel just + configuration is defaulted to 25.06.25.06.1-SNAPSHOT by Databricks. That means that only 25.06.25.06.1-SNAPSHOT task can run on an + executor with 25.06.25.06.1-SNAPSHOT GPU, which is limiting, especially on the reads and writes from Parquet. Set + this to 25.06.25.06.1-SNAPSHOT/(number of cores per executor) which will allow multiple tasks to run in parallel just like the CPU side. Having the value smaller is fine as well. Note: Please remove the `spark.task.resource.gpu.amount` config for a single-node Databricks cluster because Spark local mode does not support GPU scheduling. ```bash spark.plugins com.nvidia.spark.SQLPlugin - spark.task.resource.gpu.amount 0.1 + spark.task.resource.gpu.amount 0.25.06.25.06.1-SNAPSHOT spark.rapids.memory.pinnedPool.size 2G spark.rapids.sql.concurrentGpuTasks 2 ``` @@ -68,10 +68,10 @@ create an `init.sh` scripts with contents: ```bash spark.rapids.sql.python.gpu.enabled true spark.python.daemon.module rapids.daemon_databricks - spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-25.06.0.jar:/databricks/spark/python + spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar:/databricks/spark/python ``` Note that since python memory pool require installing the cudf library, so you need to install cudf library in - each worker nodes `pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com` or disable python memory pool + each worker nodes `pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com` or disable python memory pool `spark.rapids.python.memory.gpu.pooling.enabled=false`. 7. Click `Create Cluster`, it is now enabled for GPU-accelerated Spark. @@ -79,7 +79,7 @@ create an `init.sh` scripts with contents: Install the xgboost4j_spark jar in the cluster --------------------------- -1. See [Libraries](https://docs.databricks.com/user-guide/libraries.html) for how to install jars from DBFS +25.06.25.06.1-SNAPSHOT. See [Libraries](https://docs.databricks.com/user-guide/libraries.html) for how to install jars from DBFS 2. Go to "Libraries" tab under your cluster and install dbfs:/FileStore/jars/${XGBOOST4J_SPARK_JAR} in your cluster by selecting the "DBFS" option for installing jars These steps will ensure you are able to import xgboost libraries in python notebooks. @@ -87,7 +87,7 @@ These steps will ensure you are able to import xgboost libraries in python noteb Import the GPU Mortgage Example Notebook --------------------------- -1. See [Managing Notebooks](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html) on how to import a notebook. +25.06.25.06.1-SNAPSHOT. See [Managing Notebooks](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html) on how to import a notebook. 2. Import the example notebook: [XGBoost4j-Spark mortgage notebook](../../../../../examples/XGBoost-Examples/mortgage/notebooks/scala/mortgage-gpu.ipynb) 3. Inside the mortgage example notebook, update the data paths from "/data/datasets/mortgage-small/train" to "dbfs:/FileStore/tables/mortgage/csv/train/mortgage_train_merged.csv" @@ -98,20 +98,20 @@ See supported configuration options here: [xgboost parameters](../../../../../ex ``` bash params = { - 'eta': 0.1, - 'gamma': 0.1, + 'eta': 0.25.06.25.06.1-SNAPSHOT, + 'gamma': 0.25.06.25.06.1-SNAPSHOT, 'missing': 0.0, 'treeMethod': 'gpu_hist', - 'maxDepth': 10, + 'maxDepth': 25.06.25.06.1-SNAPSHOT0, 'maxLeaves': 256, 'growPolicy': 'depthwise', 'minChildWeight': 30.0, - 'lambda_': 1.0, + 'lambda_': 25.06.25.06.1-SNAPSHOT.0, 'scalePosWeight': 2.0, - 'subsample': 1.0, - 'nthread': 1, - 'numRound': 100, - 'numWorkers': 1, + 'subsample': 25.06.25.06.1-SNAPSHOT.0, + 'nthread': 25.06.25.06.1-SNAPSHOT, + 'numRound': 25.06.25.06.1-SNAPSHOT00, + 'numWorkers': 25.06.25.06.1-SNAPSHOT, } ``` @@ -139,13 +139,13 @@ Accuracy is 0.9980699597729774 Limitations ------------- -1. When selecting GPU nodes, Databricks UI requires the driver node to be a GPU node. However you +25.06.25.06.1-SNAPSHOT. When selecting GPU nodes, Databricks UI requires the driver node to be a GPU node. However you can use Databricks API to create a cluster with CPU driver node. Outside of Databricks the plugin can operate with the driver as a CPU node and workers as GPU nodes. 2. Cannot spin off multiple executors on a multi-GPU node. - Even though it is possible to set `spark.executor.resource.gpu.amount=1` in the in Spark + Even though it is possible to set `spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT` in the in Spark Configuration tab, Databricks overrides this to `spark.executor.resource.gpu.amount=N` (where N is the number of GPUs per node). This will result in failed executors when starting the cluster. @@ -170,7 +170,7 @@ Limitations regular integration tests on the Databricks environment to catch these issues and fix them once detected. -5. In Databricks 11.3, an incorrect result is returned for window frames defined by a range in case +5. In Databricks 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.3, an incorrect result is returned for window frames defined by a range in case of DecimalTypes with precision greater than 38. There is a bug filed in Apache Spark for it - [here](https://issues.apache.org/jira/browse/SPARK-41793), whereas when using the plugin the + [here](https://issues.apache.org/jira/browse/SPARK-425.06.25.06.1-SNAPSHOT793), whereas when using the plugin the correct result will be returned. \ No newline at end of file diff --git a/docs/get-started/xgboost-examples/csp/databricks/init.sh b/docs/get-started/xgboost-examples/csp/databricks/init.sh index 23b54ea62..cb4daaafd 100644 --- a/docs/get-started/xgboost-examples/csp/databricks/init.sh +++ b/docs/get-started/xgboost-examples/csp/databricks/init.sh @@ -12,12 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar -sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar +sudo rm -f /databricks/jars/spark--maven-trees--ml--25.06.25.06.1-SNAPSHOT0.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.25.06.25.06.1-SNAPSHOT2--ml.dmlc__xgboost4j-gpu_2.25.06.25.06.1-SNAPSHOT2__25.06.25.06.1-SNAPSHOT.5.2.jar +sudo rm -f /databricks/jars/spark--maven-trees--ml--25.06.25.06.1-SNAPSHOT0.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.25.06.25.06.1-SNAPSHOT2--ml.dmlc__xgboost4j-spark-gpu_2.25.06.25.06.1-SNAPSHOT2__25.06.25.06.1-SNAPSHOT.5.2.jar -sudo wget -O /databricks/jars/rapids-4-spark_2.12-25.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar -sudo wget -O /databricks/jars/xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar -sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar +sudo wget -O /databricks/jars/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar +sudo wget -O /databricks/jars/xgboost4j-gpu_2.25.06.25.06.1-SNAPSHOT2-25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT.jar https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.25.06.25.06.1-SNAPSHOT2/25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT/xgboost4j-gpu_2.25.06.25.06.1-SNAPSHOT2-25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT.jar +sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.25.06.25.06.1-SNAPSHOT2-25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT.jar https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.25.06.25.06.1-SNAPSHOT2/25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT/xgboost4j-spark-gpu_2.25.06.25.06.1-SNAPSHOT2-25.06.25.06.1-SNAPSHOT.7.25.06.25.06.1-SNAPSHOT.jar ls -ltr mkdir -p /dbfs/FileStore/tables/ diff --git a/docs/get-started/xgboost-examples/csp/dataproc/gcp.md b/docs/get-started/xgboost-examples/csp/dataproc/gcp.md index 52539aba4..4252e5f3f 100644 --- a/docs/get-started/xgboost-examples/csp/dataproc/gcp.md +++ b/docs/get-started/xgboost-examples/csp/dataproc/gcp.md @@ -4,7 +4,7 @@ this [guide](https://cloud.google.com/sdk/docs/install) before getting started. ## Create a Dataproc Cluster using T4's -* One 16-core master node and 2 32-core worker nodes +* One 25.06.25.06.1-SNAPSHOT6-core master node and 2 32-core worker nodes * Two NVIDIA T4 for each worker node ```bash @@ -16,11 +16,11 @@ gcloud dataproc clusters create $CLUSTER_NAME \ --region=$REGION \ - --image-version=2.0-ubuntu18 \ - --master-machine-type=n2-standard-16 \ + --image-version=2.0-ubuntu25.06.25.06.1-SNAPSHOT8 \ + --master-machine-type=n2-standard-25.06.25.06.1-SNAPSHOT6 \ --num-workers=$NUM_WORKERS \ --worker-accelerator=type=nvidia-tesla-t4,count=$NUM_GPUS \ - --worker-machine-type=n1-highmem-32\ + --worker-machine-type=n25.06.25.06.1-SNAPSHOT-highmem-32\ --num-worker-local-ssds=4 \ --initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/spark-rapids/spark-rapids.sh \ --optional-components=JUPYTER,ZEPPELIN \ @@ -34,7 +34,7 @@ Explanation of parameters: * NUM_GPUS = number of GPUs to attach to each worker node in the cluster * NUM_WORKERS = number of Spark worker nodes in the cluster -This takes around 10-15 minutes to complete. You can navigate to the Dataproc clusters tab in the +This takes around 25.06.25.06.1-SNAPSHOT0-25.06.25.06.1-SNAPSHOT5 minutes to complete. You can navigate to the Dataproc clusters tab in the Google Cloud Console to see the progress. ![Dataproc Cluster](../../../../img/GCP/dataproc-cluster.png) @@ -61,10 +61,10 @@ Then create a directory in HDFS, and run below commands, ``` ## Preparing libraries -Please make sure to install the XGBoost, cudf-cu11, numpy libraries on all nodes before running XGBoost application. +Please make sure to install the XGBoost, cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT, numpy libraries on all nodes before running XGBoost application. ``` bash pip install xgboost -pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com +pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com pip install numpy pip install scikit-learn ``` @@ -76,7 +76,7 @@ by leveraging the --archives option or spark.archives configuration. python -m venv pyspark_venv source pyspark_venv/bin/activate pip install xgboost -pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com +pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com pip install numpy pip install scikit-learn pip install venv-pack @@ -91,11 +91,11 @@ spark-submit --archives pyspark_venv.tar.gz#environment app.py Bash into the master node and start up the notebook. ``` -jupyter notebook --ip=0.0.0.0 --port=8124 --no-browser +jupyter notebook --ip=0.0.0.0 --port=825.06.25.06.1-SNAPSHOT24 --no-browser ``` If you want to remote access the notebook from local, please reserve an external static IP address first: -1. Access the IP addresses page through the navigation menu: `VPC network` -> `IP addresses` +25.06.25.06.1-SNAPSHOT. Access the IP addresses page through the navigation menu: `VPC network` -> `IP addresses` ![dataproc img2](../../../../img/GCP/dataproc-img2.png) 2. Click the `RESERVE EXTERNAL STATIC ADDRESS` button ![dataproc img3](../../../../img/GCP/dataproc-img3.png) @@ -135,10 +135,10 @@ cd custom-images export CUSTOMIZATION_SCRIPT=/path/to/spark-rapids.sh export ZONE=[Your Preferred GCP Zone] export GCS_BUCKET=[Your GCS Bucket] -export IMAGE_NAME=sample-20-ubuntu18-gpu-t4 -export DATAPROC_VERSION=2.0-ubuntu18 +export IMAGE_NAME=sample-20-ubuntu25.06.25.06.1-SNAPSHOT8-gpu-t4 +export DATAPROC_VERSION=2.0-ubuntu25.06.25.06.1-SNAPSHOT8 export GPU_NAME=nvidia-tesla-t4 -export GPU_COUNT=1 +export GPU_COUNT=25.06.25.06.1-SNAPSHOT python generate_custom_image.py \ --image-name $IMAGE_NAME \ @@ -147,7 +147,7 @@ python generate_custom_image.py \ --no-smoke-test \ --zone $ZONE \ --gcs-bucket $GCS_BUCKET \ - --machine-type n1-standard-4 \ + --machine-type n25.06.25.06.1-SNAPSHOT-standard-4 \ --accelerator type=$GPU_NAME,count=$GPU_COUNT \ --disk-size 200 \ --subnet default @@ -158,7 +158,7 @@ details on `generate_custom_image.py` script arguments and [here](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions) for dataproc version description. -The image `sample-20-ubuntu18-gpu-t4` is now ready and can be viewed in the GCP console under +The image `sample-20-ubuntu25.06.25.06.1-SNAPSHOT8-gpu-t4` is now ready and can be viewed in the GCP console under `Compute Engine > Storage > Images`. The next step is to launch the cluster using this new image and new initialization actions (that do not install NVIDIA drivers since we are already past that step). @@ -169,17 +169,17 @@ Move this to your own bucket. Let's launch the cluster: export REGION=[Your Preferred GCP Region] export GCS_BUCKET=[Your GCS Bucket] export CLUSTER_NAME=[Your Cluster Name] -export NUM_GPUS=1 +export NUM_GPUS=25.06.25.06.1-SNAPSHOT export NUM_WORKERS=2 gcloud dataproc clusters create $CLUSTER_NAME \ --region=$REGION \ - --image=sample-20-ubuntu18-gpu-t4 \ - --master-machine-type=n1-standard-4 \ + --image=sample-20-ubuntu25.06.25.06.1-SNAPSHOT8-gpu-t4 \ + --master-machine-type=n25.06.25.06.1-SNAPSHOT-standard-4 \ --num-workers=$NUM_WORKERS \ --worker-accelerator=type=nvidia-tesla-t4,count=$NUM_GPUS \ - --worker-machine-type=n1-standard-4 \ - --num-worker-local-ssds=1 \ + --worker-machine-type=n25.06.25.06.1-SNAPSHOT-standard-4 \ + --num-worker-local-ssds=25.06.25.06.1-SNAPSHOT \ --optional-components=JUPYTER,ZEPPELIN \ --metadata=rapids-runtime=SPARK \ --bucket=$GCS_BUCKET \ diff --git a/docs/get-started/xgboost-examples/dataset/mortgage.md b/docs/get-started/xgboost-examples/dataset/mortgage.md index 1c36155fa..73c7efd8b 100644 --- a/docs/get-started/xgboost-examples/dataset/mortgage.md +++ b/docs/get-started/xgboost-examples/dataset/mortgage.md @@ -4,18 +4,18 @@ ## Steps to download the data -1. Go to the [Fannie Mae](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) website -2. Click on [Single-Family Loan Performance Data](https://datadynamics.fanniemae.com/data-dynamics/?&_ga=2.181456292.2043790680.1657122341-289272350.1655822609#/reportMenu;category=HP) +25.06.25.06.1-SNAPSHOT. Go to the [Fannie Mae](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) website +2. Click on [Single-Family Loan Performance Data](https://datadynamics.fanniemae.com/data-dynamics/?&_ga=2.25.06.25.06.1-SNAPSHOT825.06.25.06.1-SNAPSHOT456292.2043790680.25.06.25.06.1-SNAPSHOT65725.06.25.06.1-SNAPSHOT223425.06.25.06.1-SNAPSHOT-289272350.25.06.25.06.1-SNAPSHOT655822609#/reportMenu;category=HP) * Register as a new user if you are using the website for the first time * Use the credentials to login 3. Select [HP](https://datadynamics.fanniemae.com/data-dynamics/#/reportMenu;category=HP) 4. Click on **Download Data** and choose *Single-Family Loan Performance Data* -5. You will find a tabular list of 'Acquisition and Performance' files sorted based on year and quarter. Click on the file to download `Eg: 2017Q1.zip` -6. Unzip the downlad file to extract the csv file `Eg: 2017Q1.csv` +5. You will find a tabular list of 'Acquisition and Performance' files sorted based on year and quarter. Click on the file to download `Eg: 2025.06.25.06.1-SNAPSHOT7Q25.06.25.06.1-SNAPSHOT.zip` +6. Unzip the downlad file to extract the csv file `Eg: 2025.06.25.06.1-SNAPSHOT7Q25.06.25.06.1-SNAPSHOT.csv` 7. Copy only the csv files to a new folder for the ETL to read ## Notes -1. Refer to the [Loan Performance Data Tutorial](https://capitalmarkets.fanniemae.com/media/9066/display) for more details. +25.06.25.06.1-SNAPSHOT. Refer to the [Loan Performance Data Tutorial](https://capitalmarkets.fanniemae.com/media/9066/display) for more details. 2. Note that *Single-Family Loan Performance Data* has 2 componenets. However, the Mortgage ETL requires only the first one (primary dataset) * Primary Dataset: Acquisition and Performance Files * HARP Dataset diff --git a/docs/get-started/xgboost-examples/notebook/python-notebook.md b/docs/get-started/xgboost-examples/notebook/python-notebook.md index 551a37c79..7bc4027f9 100644 --- a/docs/get-started/xgboost-examples/notebook/python-notebook.md +++ b/docs/get-started/xgboost-examples/notebook/python-notebook.md @@ -10,7 +10,7 @@ You should change `--master` config according to your cluster architecture. For It is assumed that the `SPARK_MASTER` and `SPARK_HOME` environment variables are defined and point to the Spark Master URL (e.g. `spark://localhost:7077`), and the home directory for Apache Spark respectively. -1. Make sure you have [Jupyter notebook installed](https://jupyter.org/install.html). +25.06.25.06.1-SNAPSHOT. Make sure you have [Jupyter notebook installed](https://jupyter.org/install.html). If you install it with conda, please make sure your Python version is consistent. @@ -20,7 +20,7 @@ and the home directory for Apache Spark respectively. 3. Launch the notebook: - Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. + Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. For ETL: @@ -32,9 +32,9 @@ and the home directory for Apache Spark respectively. --jars ${RAPIDS_JAR}\ --py-files ${SAMPLE_ZIP} \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh @@ -51,9 +51,9 @@ and the home directory for Apache Spark respectively. --py-files ${SAMPLE_ZIP} \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --conf spark.sql.execution.arrow.maxRecordsPerBatch=200000 \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh diff --git a/docs/get-started/xgboost-examples/notebook/spylon.md b/docs/get-started/xgboost-examples/notebook/spylon.md index a2e56226f..5904206c8 100644 --- a/docs/get-started/xgboost-examples/notebook/spylon.md +++ b/docs/get-started/xgboost-examples/notebook/spylon.md @@ -10,14 +10,14 @@ a [Spark Standalone Cluster](/docs/get-started/xgboost-examples/on-prem-cluster/ It is assumed that the `SPARK_MASTER` and `SPARK_HOME` environment variables are defined and point to the Spark Master URL, and the home directory for Apache Spark respectively. -1. Install Jupyter Notebook with spylon-kernel. +25.06.25.06.1-SNAPSHOT. Install Jupyter Notebook with spylon-kernel. ``` bash # Install notebook and spylon-kernel (Scala kernel for Jupyter Notebook), https://pypi.org/project/spylon-kernel/ # You can use spylon-kernel as Scala kernel for Jupyter Notebook. Do this when you want to work with Spark in Scala with a bit of Python code mixed in. RUN pip3 install jupyter notebook spylon-kernel RUN python -m spylon_kernel install # Latest version breaks nbconvert: https://github.com/ipython/ipykernel/issues/422 - RUN pip3 install ipykernel==5.1.1 + RUN pip3 install ipykernel==5.25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT ``` 2. Start Jupyter Notebook. @@ -52,8 +52,8 @@ You can debug from webUI http://your_ip:your_port with your password. # "cells": [ # { # "cell_type": "code", - # "execution_count": 1, - # "id": "5ca1ae16", + # "execution_count": 25.06.25.06.1-SNAPSHOT, + # "id": "5ca25.06.25.06.1-SNAPSHOTae25.06.25.06.1-SNAPSHOT6", # "metadata": { # ........ # ........ @@ -70,7 +70,7 @@ You can debug from webUI http://your_ip:your_port with your password. # "mimetype": "text/x-scala", # "name": "scala", # "pygments_lexer": "scala", - # "version": "0.4.1" + # "version": "0.4.25.06.25.06.1-SNAPSHOT" # } # }, # "nbformat": 4, diff --git a/docs/get-started/xgboost-examples/notebook/toree.md b/docs/get-started/xgboost-examples/notebook/toree.md index e338fd909..dc4f21503 100644 --- a/docs/get-started/xgboost-examples/notebook/toree.md +++ b/docs/get-started/xgboost-examples/notebook/toree.md @@ -10,8 +10,8 @@ You should change `--master` config according to your cluster architecture. For It is assumed that the `SPARK_MASTER` and `SPARK_HOME` environment variables are defined and point to the Spark Master URL (e.g. `spark://localhost:7077`), and the home directory for Apache Spark respectively. -1. Make sure you have jupyter notebook and [sbt](https://www.scala-sbt.org/1.x/docs/Installing-sbt-on-Linux.html) installed first. -2. Build the 'toree' locally to support scala 2.12, and install it. +25.06.25.06.1-SNAPSHOT. Make sure you have jupyter notebook and [sbt](https://www.scala-sbt.org/25.06.25.06.1-SNAPSHOT.x/docs/Installing-sbt-on-Linux.html) installed first. +2. Build the 'toree' locally to support scala 2.25.06.25.06.1-SNAPSHOT2, and install it. ``` bash # Download toree @@ -29,7 +29,7 @@ and the home directory for Apache Spark respectively. 4. Install a new kernel with gpu enabled and launch the notebook - Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. + Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. For ETL: ``` bash @@ -42,8 +42,8 @@ and the home directory for Apache Spark respectively. --jars ${RAPIDS_JAR},${SAMPLE_JAR} \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.executor.extraClassPath=${RAPIDS_JAR} \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh' ``` @@ -60,9 +60,9 @@ and the home directory for Apache Spark respectively. --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.executor.extraClassPath=${RAPIDS_JAR} \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh' ``` diff --git a/docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md b/docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md index ab2f13fa3..dfd5e0121 100644 --- a/docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md +++ b/docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md @@ -12,7 +12,7 @@ Prerequisites * Multi-node clusters with homogenous GPU configuration * Software Requirements * Ubuntu 20.04, 22.04/CentOS7, Rocky Linux 8 - * CUDA 11.0+ + * CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.0+ * NVIDIA driver compatible with your CUDA * NCCL 2.7.8+ * [Kubernetes cluster with NVIDIA GPUs](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html) @@ -26,9 +26,9 @@ Build a GPU Spark Docker Image Build a GPU Docker image with Spark resources in it, this Docker image must be accessible by each node in the Kubernetes cluster. -1. Locate your Spark installations. If you don't have one, you can [download](https://spark.apache.org/downloads.html) from Apache and unzip it. +25.06.25.06.1-SNAPSHOT. Locate your Spark installations. If you don't have one, you can [download](https://spark.apache.org/downloads.html) from Apache and unzip it. 2. `export SPARK_HOME=` -3. [Download the Dockerfile](/dockerfile/Dockerfile) into `${SPARK_HOME}`. (Here CUDA 11.0 is used as an example in the Dockerfile, +3. [Download the Dockerfile](/dockerfile/Dockerfile) into `${SPARK_HOME}`. (Here CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.0 is used as an example in the Dockerfile, you may need to update it for other CUDA versions.) 4. __(OPTIONAL)__ install any additional library jars into the `${SPARK_HOME}/jars` directory. * Most public cloud file systems are not natively supported -- pulling data and jar files from S3, GCS, etc. require installing additional libraries. @@ -61,7 +61,7 @@ Note that using [application dependencies](https://spark.apache.org/docs/latest/ the submission client’s local file system is currently not yet supported. #### Note: -1. Mortgage and Taxi jobs have ETLs to generate the processed data. +25.06.25.06.1-SNAPSHOT. Mortgage and Taxi jobs have ETLs to generate the processed data. 2. For convenience, a subset of [Taxi](/datasets/) dataset is made available in this repo that can be readily used for launching XGBoost job. Use [ETL](#etl) to generate larger datasets for trainig and testing. 3. Agaricus does not have an ETL process, it is combined with XGBoost as there is just a filter operation. @@ -69,20 +69,20 @@ Save Kubernetes Template Resources ---------------------------------- When using Spark on Kubernetes the driver and executor pods can be launched with pod templates. In the XGBoost4J-Spark use case, -these template yaml files are used to allocate and isolate specific GPUs to each pod. The following is a barebones template file to allocate 1 GPU per pod. +these template yaml files are used to allocate and isolate specific GPUs to each pod. The following is a barebones template file to allocate 25.06.25.06.1-SNAPSHOT GPU per pod. ``` -apiVersion: v1 +apiVersion: v25.06.25.06.1-SNAPSHOT kind: Pod spec: containers: - name: gpu-example resources: limits: - nvidia.com/gpu: 1 + nvidia.com/gpu: 25.06.25.06.1-SNAPSHOT ``` -This 1 GPU template file should be sufficient for all XGBoost jobs because each executor should only run 1 task on a single GPU. +This 25.06.25.06.1-SNAPSHOT GPU template file should be sufficient for all XGBoost jobs because each executor should only run 25.06.25.06.1-SNAPSHOT task on a single GPU. Save this yaml file to the local environment of the machine you are submitting jobs from, you will need to provide a path to it as an argument in your spark-submit command. Without the template file a pod will see every GPU on the cluster node it is allocated on and can attempt @@ -92,16 +92,16 @@ to execute using a GPU which is already in use -- causing undefined behavior and --------------------------- Use the ETL app to process raw Mortgage data. You can either use this ETLed data to split into training and evaluation data or run the ETL on different subsets of the dataset to produce training and evaluation datasets. -Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. +Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. Run spark-submit ``` bash ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.rapids.sql.incompatibleDateFormats.enabled=true \ --conf spark.rapids.sql.csv.read.double.enabled=true \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ @@ -159,8 +159,8 @@ export K8S_ACCOUNT= export SPARK_DEPLOY_MODE=cluster # run a single executor for this example to limit the number of spark tasks and -# partitions to 1 as currently this number must match the number of input files -export SPARK_NUM_EXECUTORS=1 +# partitions to 25.06.25.06.1-SNAPSHOT as currently this number must match the number of input files +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT # spark driver memory export SPARK_DRIVER_MEMORY=4g @@ -183,8 +183,8 @@ Run spark-submit: ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh \ --jars ${RAPIDS_JAR} \ @@ -203,7 +203,7 @@ ${SPARK_HOME}/bin/spark-submit -format=parquet \ -numWorkers=${SPARK_NUM_EXECUTORS} \ -treeMethod=${TREE_METHOD} \ - -numRound=100 \ + -numRound=25.06.25.06.1-SNAPSHOT00 \ -maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark @@ -219,7 +219,7 @@ kubectl logs -f ${POD_NAME} In the driver log, you should see timings* (in seconds), and the accuracy metric(take Mortgage as example): ``` -------------- -==> Benchmark: Elapsed time for [Mortgage GPU train csv stub Unknown Unknown Unknown]: 30.132s +==> Benchmark: Elapsed time for [Mortgage GPU train csv stub Unknown Unknown Unknown]: 30.25.06.25.06.1-SNAPSHOT32s -------------- -------------- @@ -227,7 +227,7 @@ In the driver log, you should see timings* (in seconds), and the accuracy metric -------------- -------------- -==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.9869451418401349 +==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.98694525.06.25.06.1-SNAPSHOT425.06.25.06.1-SNAPSHOT84025.06.25.06.1-SNAPSHOT349 -------------- ``` diff --git a/docs/get-started/xgboost-examples/on-prem-cluster/standalone-python.md b/docs/get-started/xgboost-examples/on-prem-cluster/standalone-python.md index b271f03d6..d35cc4827 100644 --- a/docs/get-started/xgboost-examples/on-prem-cluster/standalone-python.md +++ b/docs/get-started/xgboost-examples/on-prem-cluster/standalone-python.md @@ -12,28 +12,28 @@ Prerequisites * Multi-node clusters with homogenous GPU configuration * Software Requirements * Ubuntu 20.04, 22.04/CentOS7, Rocky Linux 8 - * CUDA 11.5+ + * CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.5+ * NVIDIA driver compatible with your CUDA * NCCL 2.7.8+ * Python 3.8 or 3.9 * NumPy - * XGBoost 1.7.0+ - * cudf-cu11 + * XGBoost 25.06.25.06.1-SNAPSHOT.7.0+ + * cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT The number of GPUs in each host dictates the number of Spark executors that can run there. -Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. +Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 25.06.25.06.1-SNAPSHOT task at any given time. For example, if each host has 4 GPUs, there should be 4 or fewer executors running on each host, -and each executor should run at most 1 task (e.g.: a total of 4 tasks running on 4 GPUs). +and each executor should run at most 25.06.25.06.1-SNAPSHOT task (e.g.: a total of 4 tasks running on 4 GPUs). In Spark Standalone mode, the default configuration is for an executor to take up all the cores assigned to each Spark Worker. -In this example, we will limit the number of cores to 1, to match our dataset. +In this example, we will limit the number of cores to 25.06.25.06.1-SNAPSHOT, to match our dataset. Please see https://spark.apache.org/docs/latest/spark-standalone.html for more documentation regarding Standalone configuration. We use `SPARK_HOME` environment variable to point to the Apache Spark cluster. And here are the steps to enable the GPU resources discovery for Spark 3.2+. -1. Copy the spark config file from template +25.06.25.06.1-SNAPSHOT. Copy the spark config file from template ``` bash cd ${SPARK_HOME}/conf/ @@ -43,17 +43,17 @@ And here are the steps to enable the GPU resources discovery for Spark 3.2+. 2. Add the following configs to the file `spark-defaults.conf`. The number in the first config should **NOT** be larger than the actual number of the GPUs on current host. - This example uses 1 as below for one GPU on the host. + This example uses 25.06.25.06.1-SNAPSHOT as below for one GPU on the host. ```bash - spark.worker.resource.gpu.amount 1 + spark.worker.resource.gpu.amount 25.06.25.06.1-SNAPSHOT spark.worker.resource.gpu.discoveryScript ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh ``` -3. Install the XGBoost, cudf-cu11, numpy libraries on all nodes before running XGBoost application. +3. Install the XGBoost, cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT, numpy libraries on all nodes before running XGBoost application. ``` bash pip install xgboost -pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com +pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com pip install numpy pip install scikit-learn ``` @@ -65,7 +65,7 @@ Make sure you have prepared the necessary packages and dataset by following this #### Note: -1. Mortgage and Taxi jobs have ETLs to generate the processed data. +25.06.25.06.1-SNAPSHOT. Mortgage and Taxi jobs have ETLs to generate the processed data. 2. For convenience, a subset of [Taxi](/datasets/) dataset is made available in this repo that can be readily used for launching XGBoost job. Use [ETL](#etl) to generate larger datasets for trainig and testing. 3. Agaricus does not have an ETL process, it is combined with XGBoost as there is just a filter operation. @@ -73,7 +73,7 @@ Make sure you have prepared the necessary packages and dataset by following this Launch a Standalone Spark Cluster --------------------------------- -1. Copy required jars to `$SPARK_HOME/jars` folder. +25.06.25.06.1-SNAPSHOT. Copy required jars to `$SPARK_HOME/jars` folder. ``` bash cp ${RAPIDS_JAR} $SPARK_HOME/jars/ @@ -91,7 +91,7 @@ Launch a Standalone Spark Cluster ``` bash export SPARK_MASTER=spark://`hostname -f`:7077 - export SPARK_CORES_PER_WORKER=1 + export SPARK_CORES_PER_WORKER=25.06.25.06.1-SNAPSHOT ${SPARK_HOME}/sbin/start-slave.sh ${SPARK_MASTER} -c ${SPARK_CORES_PER_WORKER} ``` @@ -102,15 +102,15 @@ Launch Mortgage or Taxi ETL Part --------------------------- Use the ETL app to process raw Mortgage data. You can either use this ETLed data to split into training and evaluation data or run the ETL on different subsets of the dataset to produce training and evaluation datasets. -Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. +Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. ### ETL on GPU ``` bash ${SPARK_HOME}/bin/spark-submit \ --master spark://$HOSTNAME:7077 \ --executor-memory 32G \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.sql.incompatibleDateFormats.enabled=true \ --conf spark.rapids.sql.csv.read.double.enabled=true \ @@ -137,7 +137,7 @@ ${SPARK_HOME}/bin/spark-submit \ ${SPARK_HOME}/bin/spark-submit \ --master spark://$HOSTNAME:7077 \ --executor-memory 32G \ - --conf spark.executor.instances=1 \ + --conf spark.executor.instances=25.06.25.06.1-SNAPSHOT \ --py-files ${SAMPLE_ZIP} \ main.py \ --mainClass='com.nvidia.spark.examples.mortgage.etl_main' \ @@ -166,15 +166,15 @@ Variables required to run spark-submit command: export SPARK_MASTER=spark://`hostname -f`:7077 # Currently the number of tasks and executors must match the number of input files. -# For this example, we will set these such that we have 1 executor, with 1 core per executor +# For this example, we will set these such that we have 25.06.25.06.1-SNAPSHOT executor, with 25.06.25.06.1-SNAPSHOT core per executor ## take up the the whole worker export SPARK_CORES_PER_EXECUTOR=${SPARK_CORES_PER_WORKER} -## run 1 executor -export SPARK_NUM_EXECUTORS=1 +## run 25.06.25.06.1-SNAPSHOT executor +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT -## cores/executor * num_executors, which in this case is also 1, limits +## cores/executor * num_executors, which in this case is also 25.06.25.06.1-SNAPSHOT, limits ## the number of cores given to the application export TOTAL_CORES=$((SPARK_CORES_PER_EXECUTOR * SPARK_NUM_EXECUTORS)) @@ -203,8 +203,8 @@ Run spark-submit: ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --master ${SPARK_MASTER} \ --driver-memory ${SPARK_DRIVER_MEMORY} \ --executor-memory ${SPARK_EXECUTOR_MEMORY} \ @@ -219,7 +219,7 @@ ${SPARK_HOME}/bin/spark-submit --format=parquet \ --numWorkers=${SPARK_NUM_EXECUTORS} \ --treeMethod=${TREE_METHOD} \ - --numRound=100 \ + --numRound=25.06.25.06.1-SNAPSHOT00 \ --maxDepth=8 # Change the format to csv if your input file is CSV format. @@ -230,13 +230,13 @@ In the `stdout` log on driver side, you should see timings* (in secon ``` ---------------------------------------------------------------------------------------------------- -Training takes 14.65 seconds +Training takes 25.06.25.06.1-SNAPSHOT4.65 seconds ---------------------------------------------------------------------------------------------------- -Transformation takes 12.21 seconds +Transformation takes 25.06.25.06.1-SNAPSHOT2.225.06.25.06.1-SNAPSHOT seconds ---------------------------------------------------------------------------------------------------- -Accuracy is 0.9873692247091792 +Accuracy is 0.98736922470925.06.25.06.1-SNAPSHOT792 ``` Launch XGBoost Part on CPU @@ -250,15 +250,15 @@ to set both training and testing to run on the CPU exclusively: export SPARK_MASTER=spark://`hostname -f`:7077 # Currently the number of tasks and executors must match the number of input files. -# For this example, we will set these such that we have 1 executor, with 1 core per executor +# For this example, we will set these such that we have 25.06.25.06.1-SNAPSHOT executor, with 25.06.25.06.1-SNAPSHOT core per executor ## take up the the whole worker export SPARK_CORES_PER_EXECUTOR=${SPARK_CORES_PER_WORKER} -## run 1 executor -export SPARK_NUM_EXECUTORS=1 +## run 25.06.25.06.1-SNAPSHOT executor +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT -## cores/executor * num_executors, which in this case is also 1, limits +## cores/executor * num_executors, which in this case is also 25.06.25.06.1-SNAPSHOT, limits ## the number of cores given to the application export TOTAL_CORES=$((SPARK_CORES_PER_EXECUTOR * SPARK_NUM_EXECUTORS)) @@ -298,7 +298,7 @@ ${SPARK_HOME}/bin/spark-submit --format=parquet \ --numWorkers=${SPARK_NUM_EXECUTORS} \ --treeMethod=${TREE_METHOD} \ - --numRound=100 \ + --numRound=25.06.25.06.1-SNAPSHOT00 \ --maxDepth=8 # Change the format to csv if your input file is CSV format. diff --git a/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md b/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md index 9910d45d3..f28c7d6ee 100644 --- a/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md +++ b/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md @@ -13,24 +13,24 @@ Prerequisites * Multi-node clusters with homogenous GPU configuration * Software Requirements * Ubuntu 20.04, 22.04/CentOS7, Rocky Linux 8 - * CUDA 11.0+ + * CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.0+ * NVIDIA driver compatible with your CUDA * NCCL 2.7.8+ The number of GPUs in each host dictates the number of Spark executors that can run there. Additionally, -cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. +cores per Spark executor and cores per Spark task must match, such that each executor can run 25.06.25.06.1-SNAPSHOT task at any given time. For example, if each host has 4 GPUs, there should be 4 or fewer executors running on each host, -and each executor should run at most 1 task (e.g.: a total of 4 tasks running on 4 GPUs). +and each executor should run at most 25.06.25.06.1-SNAPSHOT task (e.g.: a total of 4 tasks running on 4 GPUs). In Spark Standalone mode, the default configuration is for an executor to take up all the cores assigned to each Spark Worker. -In this example, we will limit the number of cores to 1, to match our dataset. +In this example, we will limit the number of cores to 25.06.25.06.1-SNAPSHOT, to match our dataset. Please see https://spark.apache.org/docs/latest/spark-standalone.html for more documentation regarding Standalone configuration. We use `SPARK_HOME` environment variable to point to the Apache Spark cluster. And here are steps to enable the GPU resources discovery for Spark 3.2+. -1. Copy the spark configure file from template. +25.06.25.06.1-SNAPSHOT. Copy the spark configure file from template. ``` bash cd ${SPARK_HOME}/conf/ @@ -40,10 +40,10 @@ And here are steps to enable the GPU resources discovery for Spark 3.2+. 2. Add the following configs to the file `spark-defaults.conf`. The number in first config should NOT be larger than the actual number of the GPUs on current host. - This example uses 1 as below for one GPU on the host. + This example uses 25.06.25.06.1-SNAPSHOT as below for one GPU on the host. ``` bash - spark.worker.resource.gpu.amount 1 + spark.worker.resource.gpu.amount 25.06.25.06.1-SNAPSHOT spark.worker.resource.gpu.discoveryScript ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh ``` @@ -54,7 +54,7 @@ Make sure you have prepared the necessary packages and dataset by following this [guide](/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md) #### Note: -1. Mortgage and Taxi jobs have ETLs to generate the processed data. +25.06.25.06.1-SNAPSHOT. Mortgage and Taxi jobs have ETLs to generate the processed data. 2. For convenience, a subset of [Taxi](/datasets/) dataset is made available in this repo that can be readily used for launching XGBoost job. Use [ETL](#etl) to generate larger datasets for trainig and testing. 3. Agaricus does not have an ETL process, it is combined with XGBoost as there is just a filter operation. @@ -62,7 +62,7 @@ by following this [guide](/docs/get-started/xgboost-examples/prepare-package-dat Launch a Standalone Spark Cluster --------------------------------- -1. Copy required jars to `$SPARK_HOME/jars` folder. +25.06.25.06.1-SNAPSHOT. Copy required jars to `$SPARK_HOME/jars` folder. ``` bash cp $RAPIDS_JAR $SPARK_HOME/jars/ @@ -81,7 +81,7 @@ Launch a Standalone Spark Cluster ``` bash export SPARK_MASTER=spark://`hostname -f`:7077 - export SPARK_CORES_PER_WORKER=1 + export SPARK_CORES_PER_WORKER=25.06.25.06.1-SNAPSHOT ${SPARK_HOME}/sbin/start-slave.sh ${SPARK_MASTER} -c ${SPARK_CORES_PER_WORKER} ``` @@ -95,16 +95,16 @@ Launch a Standalone Spark Cluster Use the ETL app to process raw Mortgage data. You can either use this ETLed data to split into training and evaluation data or run the ETL on different subsets of the dataset to produce training and evaluation datasets. Run spark-submit -Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. +Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. ### ETL on GPU ``` bash ${SPARK_HOME}/bin/spark-submit \ --master spark://$HOSTNAME:7077 \ --executor-memory 32G \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.sql.incompatibleDateFormats.enabled=true \ --conf spark.rapids.sql.csv.read.double.enabled=true \ @@ -132,7 +132,7 @@ ${SPARK_HOME}/bin/spark-submit \ ${SPARK_HOME}/bin/spark-submit \ --master spark://$HOSTNAME:7077 \ --executor-memory 32G \ ---conf spark.executor.instances=1 \ +--conf spark.executor.instances=25.06.25.06.1-SNAPSHOT \ --conf spark.sql.broadcastTimeout=700 \ --class com.nvidia.spark.examples.mortgage.ETLMain \ $SAMPLE_JAR \ @@ -160,15 +160,15 @@ Variables required to run spark-submit command: export SPARK_MASTER=spark://`hostname -f`:7077 # Currently the number of tasks and executors must match the number of input files. -# For this example, we will set these such that we have 1 executor, with 1 core per executor +# For this example, we will set these such that we have 25.06.25.06.1-SNAPSHOT executor, with 25.06.25.06.1-SNAPSHOT core per executor ## take up the the whole worker export SPARK_CORES_PER_EXECUTOR=${SPARK_CORES_PER_WORKER} -## run 1 executor -export SPARK_NUM_EXECUTORS=1 +## run 25.06.25.06.1-SNAPSHOT executor +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT -## cores/executor * num_executors, which in this case is also 1, limits +## cores/executor * num_executors, which in this case is also 25.06.25.06.1-SNAPSHOT, limits ## the number of cores given to the application export TOTAL_CORES=$((SPARK_CORES_PER_EXECUTOR * SPARK_NUM_EXECUTORS)) @@ -193,8 +193,8 @@ Run spark-submit: ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --master ${SPARK_MASTER} \ --driver-memory ${SPARK_DRIVER_MEMORY} \ --executor-memory ${SPARK_EXECUTOR_MEMORY} \ @@ -206,7 +206,7 @@ ${SPARK_HOME}/bin/spark-submit -format=parquet \ -numWorkers=${SPARK_NUM_EXECUTORS} \ -treeMethod=${TREE_METHOD} \ - -numRound=100 \ + -numRound=25.06.25.06.1-SNAPSHOT00 \ -maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark ``` @@ -220,11 +220,11 @@ and the accuracy metric(take Mortgage as example): -------------- -------------- -==> Benchmark: Elapsed time for [Mortgage GPU transform csv stub Unknown Unknown Unknown]: 10.323s +==> Benchmark: Elapsed time for [Mortgage GPU transform csv stub Unknown Unknown Unknown]: 25.06.25.06.1-SNAPSHOT0.323s -------------- -------------- -==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.9869227318579323 +==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.9869227325.06.25.06.1-SNAPSHOT8579323 -------------- ``` @@ -239,15 +239,15 @@ to set both training and testing to run on the CPU exclusively: export SPARK_MASTER=spark://`hostname -f`:7077 # Currently the number of tasks and executors must match the number of input files. -# For this example, we will set these such that we have 1 executor, with 1 core per executor +# For this example, we will set these such that we have 25.06.25.06.1-SNAPSHOT executor, with 25.06.25.06.1-SNAPSHOT core per executor ## take up the the whole worker export SPARK_CORES_PER_EXECUTOR=${SPARK_CORES_PER_WORKER} -## run 1 executor -export SPARK_NUM_EXECUTORS=1 +## run 25.06.25.06.1-SNAPSHOT executor +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT -## cores/executor * num_executors, which in this case is also 1, limits +## cores/executor * num_executors, which in this case is also 25.06.25.06.1-SNAPSHOT, limits ## the number of cores given to the application export TOTAL_CORES=$((SPARK_CORES_PER_EXECUTOR * SPARK_NUM_EXECUTORS)) @@ -280,7 +280,7 @@ ${SPARK_HOME}/bin/spark-submit -format=parquet \ -numWorkers=${SPARK_NUM_EXECUTORS} \ -treeMethod=${TREE_METHOD} \ - -numRound=100 \ + -numRound=25.06.25.06.1-SNAPSHOT00 \ -maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark @@ -298,7 +298,7 @@ In the `stdout` log on driver side, you should see timings* (in secon -------------- -------------- -==> Benchmark: Accuracy for [Mortgage CPU Accuracy csv stub Unknown Unknown Unknown]: 0.9872234894511343 +==> Benchmark: Accuracy for [Mortgage CPU Accuracy csv stub Unknown Unknown Unknown]: 0.9872234894525.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT343 -------------- ``` diff --git a/docs/get-started/xgboost-examples/on-prem-cluster/yarn-python.md b/docs/get-started/xgboost-examples/on-prem-cluster/yarn-python.md index 89105d492..54b37d799 100644 --- a/docs/get-started/xgboost-examples/on-prem-cluster/yarn-python.md +++ b/docs/get-started/xgboost-examples/on-prem-cluster/yarn-python.md @@ -12,32 +12,32 @@ Prerequisites * Multi-node clusters with homogenous GPU configuration * Software Requirements * Ubuntu 20.04, 22.04/CentOS7, Rocky Linux 8 - * CUDA 11.5+ + * CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.5+ * NVIDIA driver compatible with your CUDA * NCCL 2.7.8+ * Python 3.8 or 3.9 * NumPy - * XGBoost 1.7.0+ - * cudf-cu11 + * XGBoost 25.06.25.06.1-SNAPSHOT.7.0+ + * cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT The number of GPUs per NodeManager dictates the number of Spark executors that can run in that NodeManager. -Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. +Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 25.06.25.06.1-SNAPSHOT task at any given time. For example: if each NodeManager has 4 GPUs, there should be 4 or fewer executors running on each NodeManager, -and each executor should run 1 task (e.g.: A total of 4 tasks running on 4 GPUs). In order to achieve this, -you may need to adjust `spark.task.cpus` and `spark.executor.cores` to match (both set to 1 by default). +and each executor should run 25.06.25.06.1-SNAPSHOT task (e.g.: A total of 4 tasks running on 4 GPUs). In order to achieve this, +you may need to adjust `spark.task.cpus` and `spark.executor.cores` to match (both set to 25.06.25.06.1-SNAPSHOT by default). Additionally, we recommend adjusting `executor-memory` to divide host memory evenly amongst the number of GPUs in each NodeManager, such that Spark will schedule as many executors as there are GPUs in each NodeManager. We use `SPARK_HOME` environment variable to point to the Apache Spark cluster. And as to how to enable GPU scheduling and isolation for Yarn, -please refer to [here](https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). +please refer to [here](https://hadoop.apache.org/docs/r3.25.06.25.06.1-SNAPSHOT.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). -Please make sure to install the XGBoost, cudf-cu11, numpy libraries on all nodes before running XGBoost application. +Please make sure to install the XGBoost, cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT, numpy libraries on all nodes before running XGBoost application. ``` bash pip install xgboost -pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com +pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com pip install numpy pip install scikit-learn ``` @@ -49,7 +49,7 @@ by leveraging the --archives option or spark.archives configuration. python -m venv pyspark_venv source pyspark_venv/bin/activate pip install xgboost -pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com +pip install cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT --extra-index-url=https://pypi.nvidia.com pip install numpy pip install scikit-learn venv-pack -o pyspark_venv.tar.gz @@ -77,7 +77,7 @@ Launch Mortgage or Taxi ETL Part Use the ETL app to process raw Mortgage data. You can either use this ETLed data to split into training and evaluation data or run the ETL on different subsets of the dataset to produce training and evaluation datasets. -Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. +Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. ``` bash # location where data was downloaded @@ -86,8 +86,8 @@ export DATA_PATH=hdfs:/tmp/xgboost4j_spark_python/ ${SPARK_HOME}/bin/spark-submit \ --master yarn \ --deploy-mode cluster \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.rapids.sql.incompatibleDateFormats.enabled=true \ --conf spark.rapids.sql.csv.read.double.enabled=true \ --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer \ @@ -122,8 +122,8 @@ export DATA_PATH=hdfs:/tmp/xgboost4j_spark_python export SPARK_DEPLOY_MODE=cluster # run a single executor for this example to limit the number of spark tasks and -# partitions to 1 as currently this number must match the number of input files -export SPARK_NUM_EXECUTORS=1 +# partitions to 25.06.25.06.1-SNAPSHOT as currently this number must match the number of input files +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT # spark driver memory export SPARK_DRIVER_MEMORY=4g @@ -153,8 +153,8 @@ Run spark-submit: ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh \ --master yarn \ @@ -172,7 +172,7 @@ ${SPARK_HOME}/bin/spark-submit --format=parquet \ --numWorkers=${SPARK_NUM_EXECUTORS} \ --treeMethod=${TREE_METHOD} \ - --numRound=100 \ + --numRound=25.06.25.06.1-SNAPSHOT00 \ --maxDepth=8 # Change the format to csv if your input file is CSV format. @@ -183,13 +183,13 @@ In the `stdout` driver log, you should see timings* (in seconds), and ``` ---------------------------------------------------------------------------------------------------- -Training takes 10.75 seconds +Training takes 25.06.25.06.1-SNAPSHOT0.75 seconds ---------------------------------------------------------------------------------------------------- Transformation takes 4.38 seconds ---------------------------------------------------------------------------------------------------- -Accuracy is 0.997544753891 +Accuracy is 0.9975447538925.06.25.06.1-SNAPSHOT ``` Launch XGBoost Part on CPU @@ -205,8 +205,8 @@ export DATA_PATH=hdfs:/tmp/xgboost4j_spark_python/ export SPARK_DEPLOY_MODE=cluster # run a single executor for this example to limit the number of spark tasks and -# partitions to 1 as currently this number must match the number of input files -export SPARK_NUM_EXECUTORS=1 +# partitions to 25.06.25.06.1-SNAPSHOT as currently this number must match the number of input files +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT # spark driver memory export SPARK_DRIVER_MEMORY=4g @@ -246,7 +246,7 @@ ${SPARK_HOME}/bin/spark-submit --format=parquet \ --numWorkers=${SPARK_NUM_EXECUTORS} \ --treeMethod=${TREE_METHOD} \ - --numRound=100 \ + --numRound=25.06.25.06.1-SNAPSHOT00 \ --maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark @@ -256,10 +256,10 @@ In the `stdout` driver log, you should see timings* (in seconds), and ``` ---------------------------------------------------------------------------------------------------- -Training takes 10.76 seconds +Training takes 25.06.25.06.1-SNAPSHOT0.76 seconds ---------------------------------------------------------------------------------------------------- -Transformation takes 1.25 seconds +Transformation takes 25.06.25.06.1-SNAPSHOT.25 seconds ---------------------------------------------------------------------------------------------------- Accuracy is 0.998526852335 diff --git a/docs/get-started/xgboost-examples/on-prem-cluster/yarn-scala.md b/docs/get-started/xgboost-examples/on-prem-cluster/yarn-scala.md index f0387aaa2..eb2d4ede6 100644 --- a/docs/get-started/xgboost-examples/on-prem-cluster/yarn-scala.md +++ b/docs/get-started/xgboost-examples/on-prem-cluster/yarn-scala.md @@ -13,22 +13,22 @@ Prerequisites * Multi-node clusters with homogenous GPU configuration * Software Requirements * Ubuntu 20.04, 22.04/CentOS7, Rocky Linux 8 - * CUDA 11.0+ + * CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.0+ * NVIDIA driver compatible with your CUDA * NCCL 2.7.8+ The number of GPUs per NodeManager dictates the number of Spark executors that can run in that NodeManager. -Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time. +Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 25.06.25.06.1-SNAPSHOT task at any given time. For example: if each NodeManager has 4 GPUs, there should be 4 or fewer executors running on each NodeManager, -and each executor should run 1 task (e.g.: A total of 4 tasks running on 4 GPUs). In order to achieve this, -you may need to adjust `spark.task.cpus` and `spark.executor.cores` to match (both set to 1 by default). +and each executor should run 25.06.25.06.1-SNAPSHOT task (e.g.: A total of 4 tasks running on 4 GPUs). In order to achieve this, +you may need to adjust `spark.task.cpus` and `spark.executor.cores` to match (both set to 25.06.25.06.1-SNAPSHOT by default). Additionally, we recommend adjusting `executor-memory` to divide host memory evenly amongst the number of GPUs in each NodeManager, such that Spark will schedule as many executors as there are GPUs in each NodeManager. We use `SPARK_HOME` environment variable to point to the Apache Spark cluster. And as to how to enable GPU scheduling and isolation for Yarn, -please refer to [here](https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). +please refer to [here](https://hadoop.apache.org/docs/r3.25.06.25.06.1-SNAPSHOT.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). Get Jars and Dataset ------------------------------- @@ -36,7 +36,7 @@ Get Jars and Dataset Make sure you have prepared the necessary packages and dataset by following this [guide](/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md) #### Note: -1. Mortgage and Taxi jobs have ETLs to generate the processed data. +25.06.25.06.1-SNAPSHOT. Mortgage and Taxi jobs have ETLs to generate the processed data. 2. For convenience, a subset of [Taxi](/datasets/) dataset is made available in this repo that can be readily used for launching XGBoost job. Use [ETL](#etl) to generate larger datasets for trainig and testing. 3. Agaricus does not have an ETL process, it is combined with XGBoost as there is just a filter operation. @@ -52,7 +52,7 @@ Create a directory in HDFS, and copy: Use the ETL app to process raw Mortgage data. You can either use this ETLed data to split into training and evaluation data or run the ETL on different subsets of the dataset to produce training and evaluation datasets. -Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `1/spark.executor.cores`. +Note: For ETL jobs, Set `spark.task.resource.gpu.amount` to `25.06.25.06.1-SNAPSHOT/spark.executor.cores`. Run spark-submit @@ -60,9 +60,9 @@ Run spark-submit ``` bash ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.executor.cores=10 \ - --conf spark.task.resource.gpu.amount=0.1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.executor.cores=25.06.25.06.1-SNAPSHOT0 \ + --conf spark.task.resource.gpu.amount=0.25.06.25.06.1-SNAPSHOT \ --conf spark.rapids.sql.incompatibleDateFormats.enabled=true \ --conf spark.rapids.sql.csv.read.double.enabled=true \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ @@ -104,8 +104,8 @@ export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data export SPARK_DEPLOY_MODE=cluster # run a single executor for this example to limit the number of spark tasks and -# partitions to 1 as currently this number must match the number of input files -export SPARK_NUM_EXECUTORS=1 +# partitions to 25.06.25.06.1-SNAPSHOT as currently this number must match the number of input files +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT # spark driver memory export SPARK_DRIVER_MEMORY=4g @@ -128,8 +128,8 @@ Run spark-submit: ${SPARK_HOME}/bin/spark-submit \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.memory.gpu.pool=NONE \ - --conf spark.executor.resource.gpu.amount=1 \ - --conf spark.task.resource.gpu.amount=1 \ + --conf spark.executor.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ + --conf spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \ --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh \ --jars ${RAPIDS_JAR} \ @@ -145,7 +145,7 @@ ${SPARK_HOME}/bin/spark-submit -format=parquet \ -numWorkers=${SPARK_NUM_EXECUTORS} \ -treeMethod=${TREE_METHOD} \ - -numRound=100 \ + -numRound=25.06.25.06.1-SNAPSHOT00 \ -maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark ``` @@ -158,11 +158,11 @@ In the `stdout` driver log, you should see timings* (in seconds), and -------------- -------------- -==> Benchmark: Elapsed time for [Mortgage GPU transform csv stub Unknown Unknown Unknown]: 21.272s +==> Benchmark: Elapsed time for [Mortgage GPU transform csv stub Unknown Unknown Unknown]: 225.06.25.06.1-SNAPSHOT.272s -------------- -------------- -==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.9874184013493451 +==> Benchmark: Accuracy for [Mortgage GPU Accuracy csv stub Unknown Unknown Unknown]: 0.987425.06.25.06.1-SNAPSHOT84025.06.25.06.1-SNAPSHOT34934525.06.25.06.1-SNAPSHOT -------------- ``` @@ -179,8 +179,8 @@ export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data export SPARK_DEPLOY_MODE=cluster # run a single executor for this example to limit the number of spark tasks and -# partitions to 1 as currently this number must match the number of input files -export SPARK_NUM_EXECUTORS=1 +# partitions to 25.06.25.06.1-SNAPSHOT as currently this number must match the number of input files +export SPARK_NUM_EXECUTORS=25.06.25.06.1-SNAPSHOT # spark driver memory export SPARK_DRIVER_MEMORY=4g @@ -212,7 +212,7 @@ ${SPARK_HOME}/bin/spark-submit -format=parquet \ -numWorkers=${SPARK_NUM_EXECUTORS} \ -treeMethod=${TREE_METHOD} \ - -numRound=100 \ + -numRound=25.06.25.06.1-SNAPSHOT00 \ -maxDepth=8 # Please make sure to change the class and data path while running Taxi or Agaricus benchmark diff --git a/docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md b/docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md index 3164282f3..46311216a 100644 --- a/docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md +++ b/docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag ### Download the jars Download the RAPIDS Accelerator for Apache Spark plugin jar - * [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar) + * [RAPIDS Spark Package](https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar) ### Build XGBoost Python Examples @@ -14,6 +14,6 @@ Following this [guide](/docs/get-started/xgboost-examples/building-sample-apps/p ### Download dataset You need to copy the dataset to `/opt/xgboost`. Use the following links to download the data. -1. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md) -2. [Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) +25.06.25.06.1-SNAPSHOT. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md) +2. [Taxi dataset](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page) 3. [Agaricus dataset](https://github.com/dmlc/xgboost/tree/master/demo/data) diff --git a/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md b/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md index 9ceeb583e..178f9c353 100644 --- a/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md +++ b/docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md @@ -4,8 +4,8 @@ For simplicity export the location to these jars. All examples assume the packag ### Download the jars -1. Download the RAPIDS Accelerator for Apache Spark plugin jar - * [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar) +25.06.25.06.1-SNAPSHOT. Download the RAPIDS Accelerator for Apache Spark plugin jar + * [RAPIDS Spark Package](https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar) ### Build XGBoost Scala Examples @@ -14,6 +14,6 @@ Following this [guide](/docs/get-started/xgboost-examples/building-sample-apps/s ### Download dataset You need to copy the dataset to `/opt/xgboost`. Use the following links to download the data. -1. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md) -2. [Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) +25.06.25.06.1-SNAPSHOT. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md) +2. [Taxi dataset](https://www25.06.25.06.1-SNAPSHOT.nyc.gov/site/tlc/about/tlc-trip-record-data.page) 3. [Agaricus dataset](https://github.com/dmlc/xgboost/tree/master/demo/data) diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_1.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_1.png index ec6e3eab0..818cbde1f 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_1.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_1.png differ diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2.png index 83d0b577a..dc2fc6f30 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2.png differ diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2b.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2b.png index ffd1253b9..baf2b7fe3 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2b.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_2b.png differ diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_3.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_3.png index 5ac22ee15..29512d7d5 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_3.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_3.png differ diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_4.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_4.png index 1953bf68b..b9c404bf0 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_4.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_4.png differ diff --git a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_5.png b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_5.png index 8e0e04671..8b221ba0e 100644 Binary files a/docs/img/AWS-EMR/RAPIDS_EMR_GUI_5.png and b/docs/img/AWS-EMR/RAPIDS_EMR_GUI_5.png differ diff --git a/docs/img/GCP/dataproc-cluster.png b/docs/img/GCP/dataproc-cluster.png index 16c07c02e..97f536f32 100644 Binary files a/docs/img/GCP/dataproc-cluster.png and b/docs/img/GCP/dataproc-cluster.png differ diff --git a/docs/img/GCP/dataproc-img2.png b/docs/img/GCP/dataproc-img2.png index de85b4f34..869925b3b 100644 Binary files a/docs/img/GCP/dataproc-img2.png and b/docs/img/GCP/dataproc-img2.png differ diff --git a/docs/img/GCP/dataproc-img3.png b/docs/img/GCP/dataproc-img3.png index 01dba5556..d0aace943 100644 Binary files a/docs/img/GCP/dataproc-img3.png and b/docs/img/GCP/dataproc-img3.png differ diff --git a/docs/img/GCP/dataproc-img4.png b/docs/img/GCP/dataproc-img4.png index 58bc6b7c9..2330c5bf2 100644 Binary files a/docs/img/GCP/dataproc-img4.png and b/docs/img/GCP/dataproc-img4.png differ diff --git a/docs/img/GCP/dataproc-img5.png b/docs/img/GCP/dataproc-img5.png index cc3c3750e..4eaa24127 100644 Binary files a/docs/img/GCP/dataproc-img5.png and b/docs/img/GCP/dataproc-img5.png differ diff --git a/docs/img/GCP/dataproc-img6.png b/docs/img/GCP/dataproc-img6.png index 8aad26a69..00fcbcead 100644 Binary files a/docs/img/GCP/dataproc-img6.png and b/docs/img/GCP/dataproc-img6.png differ diff --git a/docs/img/databricks/initscript.png b/docs/img/databricks/initscript.png index 56207b26b..29337a87a 100644 Binary files a/docs/img/databricks/initscript.png and b/docs/img/databricks/initscript.png differ diff --git a/docs/img/databricks/sparkconfig.png b/docs/img/databricks/sparkconfig.png index f05b7d632..7d1acc3df 100644 Binary files a/docs/img/databricks/sparkconfig.png and b/docs/img/databricks/sparkconfig.png differ diff --git a/docs/img/guides/criteo-perf.png b/docs/img/guides/criteo-perf.png index f7ecd239f..f1e2045ad 100644 Binary files a/docs/img/guides/criteo-perf.png and b/docs/img/guides/criteo-perf.png differ diff --git a/docs/img/guides/cuspatial/Nycd-Community-Districts.png b/docs/img/guides/cuspatial/Nycd-Community-Districts.png index fa96b3b60..db72f07f0 100644 Binary files a/docs/img/guides/cuspatial/Nycd-Community-Districts.png and b/docs/img/guides/cuspatial/Nycd-Community-Districts.png differ diff --git a/docs/img/guides/cuspatial/Nyct2000.png b/docs/img/guides/cuspatial/Nyct2000.png index 055f3de8f..1e7acc429 100644 Binary files a/docs/img/guides/cuspatial/Nyct2000.png and b/docs/img/guides/cuspatial/Nyct2000.png differ diff --git a/docs/img/guides/cuspatial/install-jar.png b/docs/img/guides/cuspatial/install-jar.png index 0d11c81ec..4ca398eda 100644 Binary files a/docs/img/guides/cuspatial/install-jar.png and b/docs/img/guides/cuspatial/install-jar.png differ diff --git a/docs/img/guides/cuspatial/sample-polygon.png b/docs/img/guides/cuspatial/sample-polygon.png index f8afb907f..0c54e92cd 100644 Binary files a/docs/img/guides/cuspatial/sample-polygon.png and b/docs/img/guides/cuspatial/sample-polygon.png differ diff --git a/docs/img/guides/cuspatial/taxi-zones.png b/docs/img/guides/cuspatial/taxi-zones.png index a8682cb03..5a0353b0f 100644 Binary files a/docs/img/guides/cuspatial/taxi-zones.png and b/docs/img/guides/cuspatial/taxi-zones.png differ diff --git a/docs/img/guides/microbm.png b/docs/img/guides/microbm.png index 581c39543..e175bb334 100644 Binary files a/docs/img/guides/microbm.png and b/docs/img/guides/microbm.png differ diff --git a/docs/img/guides/mortgage-perf.png b/docs/img/guides/mortgage-perf.png index 11c94865a..82e8f5fb6 100644 Binary files a/docs/img/guides/mortgage-perf.png and b/docs/img/guides/mortgage-perf.png differ diff --git a/docs/img/guides/tpcds.png b/docs/img/guides/tpcds.png index 80721afe4..4b8a0fc67 100644 Binary files a/docs/img/guides/tpcds.png and b/docs/img/guides/tpcds.png differ diff --git a/docs/trouble-shooting/xgboost-examples-trouble-shooting.md b/docs/trouble-shooting/xgboost-examples-trouble-shooting.md index 7cfc5c655..305194e71 100644 --- a/docs/trouble-shooting/xgboost-examples-trouble-shooting.md +++ b/docs/trouble-shooting/xgboost-examples-trouble-shooting.md @@ -1,6 +1,6 @@ ## XGBoost -### 1. NCCL errors +### 25.06.25.06.1-SNAPSHOT. NCCL errors XGBoost supports distributed GPU training which depends on NCCL2 available at [this link](https://developer.nvidia.com/nccl). NCCL auto-detects which network interfaces to use for inter-node communication. If some interfaces are in state up, however are not able to communicate between nodes, NCCL may try to use them anyway and therefore fail during the init functions or **even hang**. diff --git a/examples/MIG-Support/README.md b/examples/MIG-Support/README.md index 16a82c5ab..56bd689f6 100644 --- a/examples/MIG-Support/README.md +++ b/examples/MIG-Support/README.md @@ -5,9 +5,9 @@ deployment requirements: - [YARN 3.3.0+ MIG GPU Plugin](/examples/MIG-Support/device-plugins/gpu-mig) for adding a Java-based plugin for MIG on top of the Pluggable Device Framework -- [YARN 3.1.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for +- [YARN 3.25.06.25.06.1-SNAPSHOT.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for patching and rebuilding YARN code base to support MIG devices. -- [YARN 3.1.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched) +- [YARN 3.25.06.25.06.1-SNAPSHOT.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched) relying on installing nvidia CLI wrappers written in `bash`, but unlike the solutions above without any Java code changes. @@ -20,7 +20,7 @@ Note that are some common caveats for the solutions above. Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations) and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices). -It is important to note that CUDA 11 only supports enumeration of a single MIG instance. +It is important to note that CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT only supports enumeration of a single MIG instance. It is recommended that you configure YARN to only allow a single GPU be requested. See the YARN config `yarn.resource-types.nvidia/miggpu.maximum-allocation` for the [Pluggable Device Framework](/examples/MIG-Support/device-plugins/gpu-mig) solution and @@ -43,17 +43,17 @@ YARN worker node host OS: ```bash for cid in $(sudo docker ps -q); do sudo docker exec $cid bash -c "printenv | grep VISIBLE; nvidia-smi -L"; done NVIDIA_VISIBLE_DEVICES=3 -GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) - MIG 1g.6gb Device 0: (UUID: MIG-70dc024a-e8d7-587c-81dd-57ad493b1d91) -NVIDIA_VISIBLE_DEVICES=1 -GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) - MIG 1c.2g.12gb Device 0: (UUID: MIG-54cc2421-6f2d-59e9-b074-20707aadd71e) +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc25.06.25.06.1-SNAPSHOT-ab62-dd25.06.25.06.1-SNAPSHOT2f2227b7d) + MIG 25.06.25.06.1-SNAPSHOTg.6gb Device 0: (UUID: MIG-70dc024a-e8d7-587c-825.06.25.06.1-SNAPSHOTdd-57ad493b25.06.25.06.1-SNAPSHOTd925.06.25.06.1-SNAPSHOT) +NVIDIA_VISIBLE_DEVICES=25.06.25.06.1-SNAPSHOT +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc25.06.25.06.1-SNAPSHOT-ab62-dd25.06.25.06.1-SNAPSHOT2f2227b7d) + MIG 25.06.25.06.1-SNAPSHOTc.2g.25.06.25.06.1-SNAPSHOT2gb Device 0: (UUID: MIG-54cc24225.06.25.06.1-SNAPSHOT-6f2d-59e9-b074-20707aadd725.06.25.06.1-SNAPSHOTe) NVIDIA_VISIBLE_DEVICES=2 -GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) - MIG 1g.6gb Device 0: (UUID: MIG-7e5552bf-d328-57a8-b091-0720d4530ffb) +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc25.06.25.06.1-SNAPSHOT-ab62-dd25.06.25.06.1-SNAPSHOT2f2227b7d) + MIG 25.06.25.06.1-SNAPSHOTg.6gb Device 0: (UUID: MIG-7e5552bf-d328-57a8-b0925.06.25.06.1-SNAPSHOT-0720d4530ffb) NVIDIA_VISIBLE_DEVICES=0 -GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) - MIG 1c.2g.12gb Device 0: (UUID: MIG-e6af58f0-9af8-594f-825e-74d23e1a68c1) +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc25.06.25.06.1-SNAPSHOT-ab62-dd25.06.25.06.1-SNAPSHOT2f2227b7d) + MIG 25.06.25.06.1-SNAPSHOTc.2g.25.06.25.06.1-SNAPSHOT2gb Device 0: (UUID: MIG-e6af58f0-9af8-594f-825e-74d23e25.06.25.06.1-SNAPSHOTa68c25.06.25.06.1-SNAPSHOT) ``` diff --git a/examples/MIG-Support/device-plugins/gpu-mig/README.md b/examples/MIG-Support/device-plugins/gpu-mig/README.md index 942a626fd..6607ffe68 100644 --- a/examples/MIG-Support/device-plugins/gpu-mig/README.md +++ b/examples/MIG-Support/device-plugins/gpu-mig/README.md @@ -13,13 +13,13 @@ It works with Apache YARN 3.3.0+ versions that support the [Pluggable Device Fra Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations) and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices). -It is important to note that CUDA 11 only supports enumeration of a single MIG instance. This means that this plugin -only supports 1 GPU per container and the plugin will throw an exception by default if you request more. +It is important to note that CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT only supports enumeration of a single MIG instance. This means that this plugin +only supports 25.06.25.06.1-SNAPSHOT GPU per container and the plugin will throw an exception by default if you request more. It is recommended that you configure YARN to only allow a single GPU be requested. See the yarn config: ``` yarn.resource-types.nvidia/miggpu.maximum-allocation ``` -See [YARN Resource Configuration](https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more details. +See [YARN Resource Configuration](https://hadoop.apache.org/docs/r3.3.25.06.25.06.1-SNAPSHOT/hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more details. If you do not configure the maximum allocation and someone requests multiple GPUs, the default behavior is to throw an exception. The user visible exception is not very useful, as the real exception will be in the nodemanager logs. See the [Configuration](#configuration) section for options if it throws an exception. @@ -30,7 +30,7 @@ if it throws an exception. mvn package ``` -This will create a jar `target/yarn-gpu-mig-plugin-1.0.0.jar`. This jar can be installed on your YARN cluster as a plugin. +This will create a jar `target/yarn-gpu-mig-plugin-25.06.25.06.1-SNAPSHOT.0.0.jar`. This jar can be installed on your YARN cluster as a plugin. ## Installation @@ -87,7 +87,7 @@ Environment variable for Spark application: ## Using with Apache Spark on YARN Spark supports [scheduling GPUs and other custom resources on YARN](http://spark.apache.org/docs/latest/running-on-yarn.html#resource-allocation-and-configuration-overview). There are 2 options for using this plugin with Spark to allocate GPUs with MIG support: -- Use Spark 3.2.1 or newer and remap the standard Spark `gpu` resource (i.e.: `spark.executor.resource.gpu.amount`) to be the new MIG GPU resource type using: +- Use Spark 3.2.25.06.25.06.1-SNAPSHOT or newer and remap the standard Spark `gpu` resource (i.e.: `spark.executor.resource.gpu.amount`) to be the new MIG GPU resource type using: ``` --conf spark.yarn.resourceGpuDeviceName=nvidia/miggpu ``` @@ -97,7 +97,7 @@ This means users don't have to change their configs if they were already using t type to `nvidia/miggpu`, update the discovery script, and specify an extra YARN config(`spark.yarn.executor.resource.nvidia/miggpu.amount`). The command would be something like below (update the amounts according to your setup): ``` - --conf spark.executor.resource.nvidia/miggpu.amount=1 --conf spark.executor.resource.nvidia/miggpu.discoveryScript=./getMIGGPUs --conf spark.task.resource.nvidia/miggpu.amount=0.25 --files ./getMIGGpus --conf spark.yarn.executor.resource.nvidia/miggpu.amount=1 + --conf spark.executor.resource.nvidia/miggpu.amount=25.06.25.06.1-SNAPSHOT --conf spark.executor.resource.nvidia/miggpu.discoveryScript=./getMIGGPUs --conf spark.task.resource.nvidia/miggpu.amount=0.25 --files ./getMIGGpus --conf spark.yarn.executor.resource.nvidia/miggpu.amount=25.06.25.06.1-SNAPSHOT ``` Note the getMIGGpus discovery script would is in the `scripts` directory in this repo. It just changes the resource name returned to match `nvidia/miggpu`. diff --git a/examples/MIG-Support/device-plugins/gpu-mig/pom.xml b/examples/MIG-Support/device-plugins/gpu-mig/pom.xml index 5be1414ef..3c589e82f 100644 --- a/examples/MIG-Support/device-plugins/gpu-mig/pom.xml +++ b/examples/MIG-Support/device-plugins/gpu-mig/pom.xml @@ -1,6 +1,6 @@ - + 4.0.0 @@ -23,7 +23,7 @@ yarn-gpu-mig-plugin YARN Device Plugin that supports MIG The root project of the YARN Device Plugin that supports MIG - 1.0.0 + 25.06.1-SNAPSHOT.0.0 jar @@ -36,10 +36,10 @@ 3.3.6 - 1.8 - 3.8.1 + 25.06.1-SNAPSHOT.8 + 3.8.25.06.1-SNAPSHOT 3.2.0 - 4.13.1 + 4.25.06.1-SNAPSHOT3.25.06.1-SNAPSHOT 3.4.6 diff --git a/examples/MIG-Support/device-plugins/gpu-mig/scripts/getMIGGPUs b/examples/MIG-Support/device-plugins/gpu-mig/scripts/getMIGGPUs index 10b8c1e8c..14ac33261 100644 --- a/examples/MIG-Support/device-plugins/gpu-mig/scripts/getMIGGPUs +++ b/examples/MIG-Support/device-plugins/gpu-mig/scripts/getMIGGPUs @@ -1,6 +1,6 @@ #!/usr/bin/env bash -# Copyright (c) 2021, NVIDIA CORPORATION. +# Copyright (c) 20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/MIG-Support/device-plugins/gpu-mig/src/main/java/com/nvidia/spark/NvidiaGPUMigPluginForRuntimeV2.java b/examples/MIG-Support/device-plugins/gpu-mig/src/main/java/com/nvidia/spark/NvidiaGPUMigPluginForRuntimeV2.java index 13232fa09..8944cff0e 100644 --- a/examples/MIG-Support/device-plugins/gpu-mig/src/main/java/com/nvidia/spark/NvidiaGPUMigPluginForRuntimeV2.java +++ b/examples/MIG-Support/device-plugins/gpu-mig/src/main/java/com/nvidia/spark/NvidiaGPUMigPluginForRuntimeV2.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -40,9 +40,9 @@ /** * Nvidia GPU plugin supporting both Nvidia container runtime v2. - * It supports discovering and allocating MIG devices. Currently, with CUDA 11, + * It supports discovering and allocating MIG devices. Currently, with CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT, * only enumeration of a single MIG instance is supported. This means that - * this plugin officially only supports 1 GPU per container and by default + * this plugin officially only supports 25.06.25.06.1-SNAPSHOT GPU per container and by default * will throw an exception if more are requested. The behavior of throwing * an exception is configurable by either setting the environment variable * {@code NVIDIA_MIG_PLUGIN_THROW_ON_MULTIPLE_GPUS} or by setting the YARN config @@ -78,8 +78,8 @@ public class NvidiaGPUMigPluginForRuntimeV2 implements DevicePlugin, private String pathOfGpuBinary = null; - // command should not run more than 10 sec. - private static final int MAX_EXEC_TIMEOUT_MS = 10 * 1000; + // command should not run more than 25.06.25.06.1-SNAPSHOT0 sec. + private static final int MAX_EXEC_TIMEOUT_MS = 25.06.25.06.1-SNAPSHOT0 * 25.06.25.06.1-SNAPSHOT000; // When executable path not set, try to search default dirs // By default search /usr/bin, /bin, and /usr/local/nvidia/bin (when @@ -116,7 +116,7 @@ public Set getDevices() throws Exception { + "output: " + oneLine + " expected index,pci.bus_id,mig.mode.current"); } String minorNumber = tokensEachLine[0].trim(); - String busId = tokensEachLine[1].trim(); + String busId = tokensEachLine[25.06.25.06.1-SNAPSHOT].trim(); String migMode = tokensEachLine[2].trim(); String majorNumber = getMajorNumber(DEV_NAME_PREFIX + minorNumber); @@ -136,9 +136,9 @@ public Set getDevices() throws Exception { Integer numMigOutputLines = linesMig.length; for (int idmig = 0; idmig < numMigOutputLines; idmig++) { // first line should start with GPU - // GPU 0: NVIDIA A30 (UUID: GPU-e7076666-0544-e103-4f65-a047fc18269e) - // MIG 1g.6gb Device 0: (UUID: MIG-de9876e2-eef7-5b5a-9701-db694ffe8a77) - if (linesMig[idmig].startsWith("GPU " + minorNumInt) && numMigOutputLines > (idmig + 1)) { + // GPU 0: NVIDIA A30 (UUID: GPU-e7076666-0544-e25.06.25.06.1-SNAPSHOT03-4f65-a047fc25.06.25.06.1-SNAPSHOT8269e) + // MIG 25.06.25.06.1-SNAPSHOTg.6gb Device 0: (UUID: MIG-de9876e2-eef7-5b5a-97025.06.25.06.1-SNAPSHOT-db694ffe8a77) + if (linesMig[idmig].startsWith("GPU " + minorNumInt) && numMigOutputLines > (idmig + 25.06.25.06.1-SNAPSHOT)) { // process any MIG devices, this expects all the lines to be MIG devices until // we find one that starts with GPU String nextLine = linesMig[++idmig].trim(); @@ -170,7 +170,7 @@ public Set getDevices() throws Exception { idmig = numMigOutputLines; } } - if (migDevCount < 1) { + if (migDevCount < 25.06.25.06.1-SNAPSHOT) { throw new IOException("Error finding MIG devices on GPU with " + "MIG enabled: " + migInfoOutput); } @@ -212,8 +212,8 @@ public DeviceRuntimeSpec onDevicesAllocated(Set allocatedDevices, YarnRuntimeType yarnRuntime) throws Exception { LOG.debug("Generating runtime spec for allocated devices: {}, {}", allocatedDevices, yarnRuntime.getName()); - if (allocatedDevices.size() > 1 && shouldThrowOnMultipleGPUs()) { - throw new YarnException("Allocating more than 1 GPU per container is" + + if (allocatedDevices.size() > 25.06.25.06.1-SNAPSHOT && shouldThrowOnMultipleGPUs()) { + throw new YarnException("Allocating more than 25.06.25.06.1-SNAPSHOT GPU per container is" + " not supported with use of MIG!"); } if (yarnRuntime == YarnRuntimeType.RUNTIME_DOCKER) { @@ -231,7 +231,7 @@ public DeviceRuntimeSpec onDevicesAllocated(Set allocatedDevices, } String minorNumbers = gpuMinorNumbersSB.toString(); LOG.info("Nvidia Docker v2 assigned GPU: " + minorNumbers); - String deviceStr = minorNumbers.substring(0, minorNumbers.length() - 1); + String deviceStr = minorNumbers.substring(0, minorNumbers.length() - 25.06.25.06.1-SNAPSHOT); return DeviceRuntimeSpec.Builder.newInstance() .addEnv(nvidiaVisibleDevices, deviceStr) .setContainerRuntime(nvidiaRuntime) @@ -253,7 +253,7 @@ private String getMajorNumber(String devName) { LOG.debug("Get major numbers from /dev/{}", devName); output = shellExecutor.getMajorMinorInfo(devName); String[] strs = output.trim().split(":"); - output = Integer.toString(Integer.parseInt(strs[0], 16)); + output = Integer.toString(Integer.parseInt(strs[0], 25.06.25.06.1-SNAPSHOT6)); } catch (IOException e) { String msg = "Failed to get major number from reading /dev/" + devName; @@ -273,7 +273,7 @@ public Set allocateDevices(Set availableDevices, int count, if (envShouldThrow != null) { shouldThrowOnMultipleGPUFromEnv = envShouldThrow; } - // Only officially support 1 GPU per container so don't worry about topology + // Only officially support 25.06.25.06.1-SNAPSHOT GPU per container so don't worry about topology // scheduling. basicSchedule(allocation, count, availableDevices); return allocation; diff --git a/examples/MIG-Support/device-plugins/gpu-mig/src/test/java/com/nvidia/spark/TestNvidiaGPUMigPluginForRuntimeV2.java b/examples/MIG-Support/device-plugins/gpu-mig/src/test/java/com/nvidia/spark/TestNvidiaGPUMigPluginForRuntimeV2.java index e705f6bb2..00093aa6b 100644 --- a/examples/MIG-Support/device-plugins/gpu-mig/src/test/java/com/nvidia/spark/TestNvidiaGPUMigPluginForRuntimeV2.java +++ b/examples/MIG-Support/device-plugins/gpu-mig/src/test/java/com/nvidia/spark/TestNvidiaGPUMigPluginForRuntimeV2.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -45,21 +45,21 @@ public void testGetNvidiaDevices() throws Exception { mock(NvidiaGPUMigPluginForRuntimeV2.NvidiaCommandExecutor.class); String deviceInfoShellOutput = "0, 00000000:04:00.0, [N/A]\n" + - "1, 00000000:82:00.0, Enabled"; + "25.06.25.06.1-SNAPSHOT, 00000000:82:00.0, Enabled"; String majorMinorNumber0 = "c3:0"; - String majorMinorNumber1 = "c3:1"; + String majorMinorNumber25.06.25.06.1-SNAPSHOT = "c3:25.06.25.06.1-SNAPSHOT"; String deviceMigInfoShellOutput = - "GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-aa72194b-fdd4-24b0-f659-17c929f46267)\n" + - " MIG 1g.10gb Device 0: (UUID: MIG-aa2c982c-48a9-5046-b7f8-aa4732879e02)\n" + - "GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-aa7153bf-c0ba-00ef-cdce-f861c34172f6)\n" + - " MIG 1g.10gb Device 0: (UUID: MIG-aa59d467-ba39-5d0a-a085-66af03246526)\n" + - " MIG 1g.10gb Device 1: (UUID: MIG-aad5cb29-8e6f-510a-8352-8e18f483dc74)" + + "GPU 0: NVIDIA A25.06.25.06.1-SNAPSHOT00 80GB PCIe (UUID: GPU-aa7225.06.25.06.1-SNAPSHOT94b-fdd4-24b0-f659-25.06.25.06.1-SNAPSHOT7c929f46267)\n" + + " MIG 25.06.25.06.1-SNAPSHOTg.25.06.25.06.1-SNAPSHOT0gb Device 0: (UUID: MIG-aa2c982c-48a9-5046-b7f8-aa4732879e02)\n" + + "GPU 25.06.25.06.1-SNAPSHOT: NVIDIA A25.06.25.06.1-SNAPSHOT00 80GB PCIe (UUID: GPU-aa725.06.25.06.1-SNAPSHOT53bf-c0ba-00ef-cdce-f8625.06.25.06.1-SNAPSHOTc3425.06.25.06.1-SNAPSHOT72f6)\n" + + " MIG 25.06.25.06.1-SNAPSHOTg.25.06.25.06.1-SNAPSHOT0gb Device 0: (UUID: MIG-aa59d467-ba39-5d0a-a085-66af03246526)\n" + + " MIG 25.06.25.06.1-SNAPSHOTg.25.06.25.06.1-SNAPSHOT0gb Device 25.06.25.06.1-SNAPSHOT: (UUID: MIG-aad5cb29-8e6f-525.06.25.06.1-SNAPSHOT0a-8352-8e25.06.25.06.1-SNAPSHOT8f483dc74)" + when(mockShell.getDeviceInfo()).thenReturn(deviceInfoShellOutput); when(mockShell.getDeviceMigInfo()).thenReturn(deviceMigInfoShellOutput); when(mockShell.getMajorMinorInfo("nvidia0")) .thenReturn(majorMinorNumber0); - when(mockShell.getMajorMinorInfo("nvidia1")) - .thenReturn(majorMinorNumber1); + when(mockShell.getMajorMinorInfo("nvidia25.06.25.06.1-SNAPSHOT")) + .thenReturn(majorMinorNumber25.06.25.06.1-SNAPSHOT); NvidiaGPUMigPluginForRuntimeV2 plugin = new NvidiaGPUMigPluginForRuntimeV2(); plugin.setShellExecutor(mockShell); plugin.setPathOfGpuBinary("/fake/nvidia-smi"); @@ -69,23 +69,23 @@ public void testGetNvidiaDevices() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setStatus("0") .setMinorNumber(0).build()); expectedDevices.add(Device.Builder.newInstance() - .setId(1).setHealthy(true) + .setId(25.06.25.06.1-SNAPSHOT).setHealthy(true) .setBusID("00000000:82:00.0") - .setDevPath("/dev/nvidia1") - .setMajorNumber(195) + .setDevPath("/dev/nvidia25.06.25.06.1-SNAPSHOT") + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setStatus("0") - .setMinorNumber(1).build()); + .setMinorNumber(25.06.25.06.1-SNAPSHOT).build()); expectedDevices.add(Device.Builder.newInstance() .setId(2).setHealthy(true) .setBusID("00000000:82:00.0") - .setDevPath("/dev/nvidia1") - .setMajorNumber(195) - .setStatus("1") - .setMinorNumber(1).build()); + .setDevPath("/dev/nvidia25.06.25.06.1-SNAPSHOT") + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) + .setStatus("25.06.25.06.1-SNAPSHOT") + .setMinorNumber(25.06.25.06.1-SNAPSHOT).build()); Set devices = plugin.getDevices(); Assert.assertEquals(expectedDevices, devices); } @@ -104,7 +104,7 @@ public void testOnDeviceAllocatedMultiGPU() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setMinorNumber(0).build()); spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); @@ -115,9 +115,9 @@ public void testOnDeviceAllocatedMultiGPU() throws Exception { allocatedDevices.add(Device.Builder.newInstance() .setId(0).setHealthy(true) .setBusID("00000000:82:00.0") - .setDevPath("/dev/nvidia1") - .setMajorNumber(195) - .setMinorNumber(1).build()); + .setDevPath("/dev/nvidia25.06.25.06.1-SNAPSHOT") + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) + .setMinorNumber(25.06.25.06.1-SNAPSHOT).build()); spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); } @@ -136,16 +136,16 @@ public void testMultiGPUsEnvPrecedence() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setMinorNumber(0).build()); // two device allowed allocatedDevices.add(Device.Builder.newInstance() .setId(0).setHealthy(true) .setBusID("00000000:82:00.0") - .setDevPath("/dev/nvidia1") - .setMajorNumber(195) - .setMinorNumber(1).build()); + .setDevPath("/dev/nvidia25.06.25.06.1-SNAPSHOT") + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) + .setMinorNumber(25.06.25.06.1-SNAPSHOT).build()); // test that env variable takes presedence plugin.setShouldThrowOnMultipleGPUFromConf(true); @@ -156,7 +156,7 @@ public void testMultiGPUsEnvPrecedence() throws Exception { spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); Assert.assertEquals("nvidia", spec.getContainerRuntime()); - Assert.assertEquals("0,1", spec.getEnvs().get("NVIDIA_VISIBLE_DEVICES")); + Assert.assertEquals("0,25.06.25.06.1-SNAPSHOT", spec.getEnvs().get("NVIDIA_VISIBLE_DEVICES")); } @Test @@ -173,23 +173,23 @@ public void testMultiGPUsConf() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setMinorNumber(0).build()); // two device allowed allocatedDevices.add(Device.Builder.newInstance() .setId(0).setHealthy(true) .setBusID("00000000:82:00.0") - .setDevPath("/dev/nvidia1") - .setMajorNumber(195) - .setMinorNumber(1).build()); + .setDevPath("/dev/nvidia25.06.25.06.1-SNAPSHOT") + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) + .setMinorNumber(25.06.25.06.1-SNAPSHOT).build()); // test that env variable takes presedence plugin.setShouldThrowOnMultipleGPUFromConf(false); spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); Assert.assertEquals("nvidia", spec.getContainerRuntime()); - Assert.assertEquals("0,1", spec.getEnvs().get("NVIDIA_VISIBLE_DEVICES")); + Assert.assertEquals("0,25.06.25.06.1-SNAPSHOT", spec.getEnvs().get("NVIDIA_VISIBLE_DEVICES")); } @Test @@ -210,7 +210,7 @@ public void testOnDeviceAllocatedMig() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setMinorNumber(0).build()); spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); @@ -232,7 +232,7 @@ public void testOnDeviceAllocatedNoMig() throws Exception { .setId(0).setHealthy(true) .setBusID("00000000:04:00.0") .setDevPath("/dev/nvidia0") - .setMajorNumber(195) + .setMajorNumber(25.06.25.06.1-SNAPSHOT95) .setMinorNumber(0).build()); spec = plugin.onDevicesAllocated(allocatedDevices, YarnRuntimeType.RUNTIME_DOCKER); diff --git a/examples/MIG-Support/resource-types/gpu-mig/README.md b/examples/MIG-Support/resource-types/gpu-mig/README.md index 37142dff8..a270511a1 100644 --- a/examples/MIG-Support/resource-types/gpu-mig/README.md +++ b/examples/MIG-Support/resource-types/gpu-mig/README.md @@ -1,4 +1,4 @@ -# NVIDIA Support for GPU for YARN with MIG support for YARN 3.1.2 until YARN 3.3.0 +# NVIDIA Support for GPU for YARN with MIG support for YARN 3.25.06.25.06.1-SNAPSHOT.2 until YARN 3.3.0 This adds support for GPUs with [MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/) on YARN for versions prior to YARN 3.3.0 which don't support the pluggable device framework. Use the [GPU Plugin for YARN with MIG support](../../device-plugins/gpu-mig/README.md) @@ -8,7 +8,7 @@ environments where there may be some MIG enabled GPUs and some without MIG. This ## Compatibility -Requires YARN 3.1.2 or newer that supports GPU scheduling. See the [supported versions](#supported-versions) section below for specific versions supported. +Requires YARN 3.25.06.25.06.1-SNAPSHOT.2 or newer that supports GPU scheduling. See the [supported versions](#supported-versions) section below for specific versions supported. MIG support requires YARN to be configured with Docker and using the NVIDIA Container Toolkit (nvidia-docker2) ## Limitations @@ -16,30 +16,30 @@ MIG support requires YARN to be configured with Docker and using the NVIDIA Cont Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations) and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices). -It is important to note that CUDA 11 only supports enumeration of a single MIG instance. This means that with this patch -and MIG support enabled, it only supports 1 GPU per container and will throw an exception by default if you request more. +It is important to note that CUDA 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT only supports enumeration of a single MIG instance. This means that with this patch +and MIG support enabled, it only supports 25.06.25.06.1-SNAPSHOT GPU per container and will throw an exception by default if you request more. It is recommended that you configure YARN to only allow a single GPU be requested. See the yarn config: ``` yarn.resource-types.yarn.io/gpu.maximum-allocation ``` -See [YARN Resource Configuration](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more details. +See [YARN Resource Configuration](https://hadoop.apache.org/docs/r3.25.06.25.06.1-SNAPSHOT.2/hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more details. If you do not configure the maximum allocation and someone requests multiple GPUs, the default behavior is to throw an exception. See the [Configuration](#configuration) section for options if it throws an exception. ## Supported Versions There are different patches available depending on the YARN version you are using: -- YARN 3.1.2 use patch `yarn312MIG.patch` -- YARN versions 3.1.3 to 3.1.5 (git hash cd7c34f9b4005d27886f73e58bef88e706fcccf9 since 3.1.5 was not released when this was tested) use `yarn313to315MIG.patch` -- YARN 3.2.0, no patch is currently available, backport patch for YARN 3.2.1 or contact us. -- YARN 3.2.1 and 3.2.3 use patch `yarn321to323MIG.patch` +- YARN 3.25.06.25.06.1-SNAPSHOT.2 use patch `yarn325.06.25.06.1-SNAPSHOT2MIG.patch` +- YARN versions 3.25.06.25.06.1-SNAPSHOT.3 to 3.25.06.25.06.1-SNAPSHOT.5 (git hash cd7c34f9b4005d27886f73e58bef88e706fcccf9 since 3.25.06.25.06.1-SNAPSHOT.5 was not released when this was tested) use `yarn325.06.25.06.1-SNAPSHOT3to325.06.25.06.1-SNAPSHOT5MIG.patch` +- YARN 3.2.0, no patch is currently available, backport patch for YARN 3.2.25.06.25.06.1-SNAPSHOT or contact us. +- YARN 3.2.25.06.25.06.1-SNAPSHOT and 3.2.3 use patch `yarn3225.06.25.06.1-SNAPSHOTto323MIG.patch` ## Building Apply the patch to your YARN version and build it like you would normally for your deployment. For example: ``` -patch -p1 < yarn312MIG.patch +patch -p25.06.25.06.1-SNAPSHOT < yarn325.06.25.06.1-SNAPSHOT2MIG.patch mvn clean package -Pdist -Dtar -DskipTests ``` @@ -78,12 +78,12 @@ It also allows you to manually allow certain gpu devices. This configuration was GPU device is identified by their minor device number, index, and optionally MIG device index. A common approach to get minor device number of GPUs is using nvidia-smi -q and search Minor Number output and optionally MIG device indices. The format is index:minor_number[:mig_index][,index:minor_number...]. An example of manual specification is -0:0,1:1:0,1:1:1,2:2" to allow YARN NodeManager to manage GPU devices with indices 0/1/2 and minor number 0/1/2 -where GPU indices 1 has 2 MIG enabled devices with indices 0/1. +0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT,2:2" to allow YARN NodeManager to manage GPU devices with indices 0/25.06.25.06.1-SNAPSHOT/2 and minor number 0/25.06.25.06.1-SNAPSHOT/2 +where GPU indices 25.06.25.06.1-SNAPSHOT has 2 MIG enabled devices with indices 0/25.06.25.06.1-SNAPSHOT. ``` yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices - 0:0,1:1:0,1:1:1,2:2 + 0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT,2:2 ``` diff --git a/examples/MIG-Support/resource-types/gpu-mig/yarn312MIG.patch b/examples/MIG-Support/resource-types/gpu-mig/yarn312MIG.patch index 1242cfdba..67270dc5f 100644 --- a/examples/MIG-Support/resource-types/gpu-mig/yarn312MIG.patch +++ b/examples/MIG-Support/resource-types/gpu-mig/yarn312MIG.patch @@ -1,8 +1,8 @@ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -index 36fafefdbc4..e37d0a3a685 100644 +index 36fafefdbc4..e37d0a3a685 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -@@ -1574,6 +1574,10 @@ public static boolean isAclEnabled(Configuration conf) { +@@ -25.06.25.06.1-SNAPSHOT574,6 +25.06.25.06.1-SNAPSHOT574,25.06.25.06.1-SNAPSHOT0 @@ public static boolean isAclEnabled(Configuration conf) { @Private public static final String AUTOMATICALLY_DISCOVER_GPU_DEVICES = "auto"; @@ -14,10 +14,10 @@ index 36fafefdbc4..e37d0a3a685 100644 * This setting controls where to how to invoke GPU binaries */ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -index 26fd9050742..e84b920dcee 100644 +index 26fd9050742..e84b920dcee 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -@@ -34,6 +34,12 @@ public AssignedGpuDevice(int index, int minorNumber, +@@ -34,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public AssignedGpuDevice(int index, int minorNumber, this.containerId = containerId.toString(); } @@ -38,7 +38,7 @@ index 26fd9050742..e84b920dcee 100644 && containerId.equals(other.containerId); } -@@ -68,12 +75,16 @@ public int compareTo(Object obj) { +@@ -68,25.06.25.06.1-SNAPSHOT2 +75,25.06.25.06.1-SNAPSHOT6 @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -58,18 +58,18 @@ index 26fd9050742..e84b920dcee 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java -index bce1d9fa480..3cb42d3c58f 100644 +index bce25.06.25.06.1-SNAPSHOTd9fa480..3cb42d3c58f 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java @@ -26,6 +26,7 @@ public class GpuDevice implements Serializable, Comparable { protected int index; protected int minorNumber; -+ protected int migDeviceIndex = -1; - private static final long serialVersionUID = -6812314470754667710L; ++ protected int migDeviceIndex = -25.06.25.06.1-SNAPSHOT; + private static final long serialVersionUID = -6825.06.25.06.1-SNAPSHOT2325.06.25.06.1-SNAPSHOT4470754667725.06.25.06.1-SNAPSHOT0L; public GpuDevice(int index, int minorNumber) { -@@ -33,6 +34,12 @@ public GpuDevice(int index, int minorNumber) { +@@ -33,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public GpuDevice(int index, int minorNumber) { this.minorNumber = minorNumber; } @@ -82,7 +82,7 @@ index bce1d9fa480..3cb42d3c58f 100644 public int getIndex() { return index; } -@@ -41,13 +48,17 @@ public int getMinorNumber() { +@@ -425.06.25.06.1-SNAPSHOT,25.06.25.06.1-SNAPSHOT3 +48,25.06.25.06.1-SNAPSHOT7 @@ public int getMinorNumber() { return minorNumber; } @@ -101,7 +101,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } @Override -@@ -62,17 +73,21 @@ public int compareTo(Object obj) { +@@ -62,25.06.25.06.1-SNAPSHOT7 +73,225.06.25.06.1-SNAPSHOT @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -127,7 +127,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java -index 6e3cf1315ce..55f7379d4cc 100644 +index 6e3cf25.06.25.06.1-SNAPSHOT325.06.25.06.1-SNAPSHOT5ce..55f7379d4cc 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java @@ -30,6 +30,7 @@ @@ -138,7 +138,7 @@ index 6e3cf1315ce..55f7379d4cc 100644 import org.slf4j.Logger; import org.slf4j.LoggerFactory; -@@ -149,6 +150,10 @@ public synchronized GpuDeviceInformation getGpuDeviceInformation() +@@ -25.06.25.06.1-SNAPSHOT49,6 +25.06.25.06.1-SNAPSHOT50,25.06.25.06.1-SNAPSHOT0 @@ public synchronized GpuDeviceInformation getGpuDeviceInformation() YarnConfiguration.NM_GPU_ALLOWED_DEVICES, YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES); @@ -149,7 +149,7 @@ index 6e3cf1315ce..55f7379d4cc 100644 List gpuDevices = new ArrayList<>(); if (allowedDevicesStr.equals( -@@ -171,21 +176,45 @@ public synchronized GpuDeviceInformation getGpuDeviceInformation() +@@ -25.06.25.06.1-SNAPSHOT725.06.25.06.1-SNAPSHOT,225.06.25.06.1-SNAPSHOT +25.06.25.06.1-SNAPSHOT76,45 @@ public synchronized GpuDeviceInformation getGpuDeviceInformation() i++) { List gpuInfos = lastDiscoveredGpuInformation.getGpus(); @@ -183,10 +183,10 @@ index 6e3cf1315ce..55f7379d4cc 100644 + if (kv.length == 3) { + // assumes this is MIG enabled device + gpuDevices.add( -+ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[1]), Integer.parseInt(kv[2]))); ++ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[25.06.25.06.1-SNAPSHOT]), Integer.parseInt(kv[2]))); + } else { + gpuDevices.add( -+ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[1]))); ++ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[25.06.25.06.1-SNAPSHOT]))); + } + } else { + if (kv.length != 2) { @@ -195,16 +195,16 @@ index 6e3cf1315ce..55f7379d4cc 100644 + + s); + } + gpuDevices.add( -+ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[1]))); ++ new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[25.06.25.06.1-SNAPSHOT]))); } - - gpuDevices.add( -- new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[1]))); +- new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[25.06.25.06.1-SNAPSHOT]))); } } LOG.info("Allowed GPU devices:" + gpuDevices); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java -index 051afd6c561..996cb58ac45 100644 +index 0525.06.25.06.1-SNAPSHOTafd6c5625.06.25.06.1-SNAPSHOT..996cb58ac45 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java @@ -36,7 +36,7 @@ public static DockerCommandPlugin createGpuDockerCommandPlugin( @@ -217,10 +217,10 @@ index 051afd6c561..996cb58ac45 100644 throw new YarnException( diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -index ff25eb6ced6..c2cc0e5a2d1 100644 +index ff25eb6ced6..c2cc0e5a2d25.06.25.06.1-SNAPSHOT 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -@@ -21,7 +21,9 @@ +@@ -225.06.25.06.1-SNAPSHOT,7 +225.06.25.06.1-SNAPSHOT,9 @@ import com.google.common.annotations.VisibleForTesting; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; @@ -230,7 +230,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container; import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings; import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator; -@@ -45,8 +47,12 @@ +@@ -45,8 +47,25.06.25.06.1-SNAPSHOT2 @@ private String nvidiaRuntime = "nvidia"; private String nvidiaVisibleDevices = "NVIDIA_VISIBLE_DEVICES"; @@ -244,16 +244,16 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 private Set getAssignedGpus(Container container) { ResourceMappings resourceMappings = container.getResourceMappings(); -@@ -84,10 +90,23 @@ public synchronized void updateDockerRunCommand( +@@ -84,25.06.25.06.1-SNAPSHOT0 +90,23 @@ public synchronized void updateDockerRunCommand( return; } Map environment = new HashMap<>(); -+ if (isMigEnabled && assignedResources.size() > 1) { ++ if (isMigEnabled && assignedResources.size() > 25.06.25.06.1-SNAPSHOT) { + Map existingEnv = container.getLaunchContext().getEnvironment(); + Boolean shouldThrowOnMultipleGpus = Boolean.parseBoolean( + existingEnv.getOrDefault(nvidiaMigThrowOnMultiGpus, "true")); + if (shouldThrowOnMultipleGpus) { -+ throw new ContainerExecutionException("Allocating more than 1 GPU per container is " + ++ throw new ContainerExecutionException("Allocating more than 25.06.25.06.1-SNAPSHOT GPU per container is " + + "not supported with use of MIG!"); + } + } @@ -262,7 +262,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 - gpuIndexList = gpuIndexList + gpuDevice.getIndex() + ","; - LOG.info("nvidia docker2 assigned gpu index: " + gpuDevice.getIndex()); + String deviceIndex = String.valueOf(gpuDevice.getIndex()); -+ if (gpuDevice.getMIGIndex() != -1) { ++ if (gpuDevice.getMIGIndex() != -25.06.25.06.1-SNAPSHOT) { + deviceIndex = gpuDevice.getIndex() + ":" + gpuDevice.getMIGIndex(); + } + gpuIndexList = gpuIndexList + deviceIndex + ","; @@ -271,10 +271,10 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 dockerRunCommand.addRuntime(nvidiaRuntime); environment.put(nvidiaVisibleDevices, diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -index 25c2e3a1f1d..15cb7eac10a 100644 +index 25c2e3a25.06.25.06.1-SNAPSHOTf25.06.25.06.1-SNAPSHOTd..25.06.25.06.1-SNAPSHOT5cb7eac25.06.25.06.1-SNAPSHOT0a 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -@@ -22,8 +22,10 @@ +@@ -22,8 +22,25.06.25.06.1-SNAPSHOT0 @@ import org.apache.hadoop.classification.InterfaceStability; import javax.xml.bind.annotation.XmlElement; @@ -287,14 +287,14 @@ index 25c2e3a1f1d..15cb7eac10a 100644 * Capture single GPU device information such as memory size, temperature, @@ -38,6 +40,8 @@ private String uuid = "N/A"; - private int minorNumber = -1; + private int minorNumber = -25.06.25.06.1-SNAPSHOT; + private List migDevices; + private PerGpuMigMode migMode; private PerGpuUtilizations gpuUtilizations; private PerGpuMemoryUsage gpuMemoryUsage; private PerGpuTemperature temperature; -@@ -108,6 +112,25 @@ public void setUuid(String uuid) { +@@ -25.06.25.06.1-SNAPSHOT08,6 +25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT2,25 @@ public void setUuid(String uuid) { this.uuid = uuid; } @@ -321,11 +321,11 @@ index 25c2e3a1f1d..15cb7eac10a 100644 public String getProductName() { return productName; diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..4ce7cec6e55 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -375,11 +375,11 @@ index 00000000000..4ce7cec6e55 + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..b706df2c3bb --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -429,10 +429,10 @@ index 00000000000..b706df2c3bb + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java -index 4abb633a69a..404930d00c2 100644 +index 4abb633a69a..404930d00c2 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java -@@ -138,4 +138,47 @@ public void getNumberOfUsableGpusFromConfig() throws YarnException { +@@ -25.06.25.06.1-SNAPSHOT38,4 +25.06.25.06.1-SNAPSHOT38,47 @@ public void getNumberOfUsableGpusFromConfig() throws YarnException { Assert.assertTrue(2 == usableGpuDevices.get(2).getMinorNumber()); Assert.assertTrue(4 == usableGpuDevices.get(3).getMinorNumber()); } @@ -443,7 +443,7 @@ index 4abb633a69a..404930d00c2 100644 + conf.set(YarnConfiguration.USE_MIG_ENABLED_GPUS, "true"); + + // Illegal format -+ conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1:2,2:2:0,3"); ++ conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:2,2:2:0,3"); + GpuDiscoverer plugin = new GpuDiscoverer(); + try { + plugin.initialize(conf); @@ -454,7 +454,7 @@ index 4abb633a69a..404930d00c2 100644 + } + + // Valid format -+ conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1:0,1:1:2,2:2:0,3:4"); ++ conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:2,2:2:0,3:4"); + plugin = new GpuDiscoverer(); + plugin.initialize(conf); + @@ -462,29 +462,29 @@ index 4abb633a69a..404930d00c2 100644 + Assert.assertEquals(5, usableGpuDevices.size()); + + Assert.assertTrue(0 == usableGpuDevices.get(0).getIndex()); -+ Assert.assertTrue(1 == usableGpuDevices.get(1).getIndex()); -+ Assert.assertTrue(1 == usableGpuDevices.get(2).getIndex()); ++ Assert.assertTrue(25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getIndex()); ++ Assert.assertTrue(25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(2).getIndex()); + Assert.assertTrue(2 == usableGpuDevices.get(3).getIndex()); + Assert.assertTrue(3 == usableGpuDevices.get(4).getIndex()); + + Assert.assertTrue(0 == usableGpuDevices.get(0).getMinorNumber()); -+ Assert.assertTrue(1 == usableGpuDevices.get(1).getMinorNumber()); -+ Assert.assertTrue(1 == usableGpuDevices.get(2).getMinorNumber()); ++ Assert.assertTrue(25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMinorNumber()); ++ Assert.assertTrue(25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(2).getMinorNumber()); + Assert.assertTrue(2 == usableGpuDevices.get(3).getMinorNumber()); + Assert.assertTrue(4 == usableGpuDevices.get(4).getMinorNumber()); + -+ Assert.assertTrue(-1 == usableGpuDevices.get(0).getMIGIndex()); -+ Assert.assertTrue(0 == usableGpuDevices.get(1).getMIGIndex()); ++ Assert.assertTrue(-25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(0).getMIGIndex()); ++ Assert.assertTrue(0 == usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMIGIndex()); + Assert.assertTrue(2 == usableGpuDevices.get(2).getMIGIndex()); + Assert.assertTrue(0 == usableGpuDevices.get(3).getMIGIndex()); -+ Assert.assertTrue(-1 == usableGpuDevices.get(4).getMIGIndex()); ++ Assert.assertTrue(-25.06.25.06.1-SNAPSHOT == usableGpuDevices.get(4).getMIGIndex()); + } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -index b0b523360ef..798a95cb009 100644 +index b0b523360ef..798a95cb009 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -@@ -20,10 +20,14 @@ +@@ -20,25.06.25.06.1-SNAPSHOT0 +20,25.06.25.06.1-SNAPSHOT4 @@ import com.google.common.collect.ImmutableList; import com.google.common.collect.Sets; @@ -499,7 +499,7 @@ index b0b523360ef..798a95cb009 100644 import org.junit.Assert; import org.junit.Test; -@@ -69,7 +73,13 @@ private boolean commandlinesEquals(Map> cli1, +@@ -69,7 +73,25.06.25.06.1-SNAPSHOT3 @@ private boolean commandlinesEquals(Map> cli25.06.25.06.1-SNAPSHOT, extends NvidiaDockerV2CommandPlugin { private boolean requestsGpu = false; @@ -514,7 +514,7 @@ index b0b523360ef..798a95cb009 100644 public void setRequestsGpu(boolean r) { requestsGpu = r; -@@ -127,4 +137,118 @@ public void testPlugin() throws Exception { +@@ -25.06.25.06.1-SNAPSHOT27,4 +25.06.25.06.1-SNAPSHOT37,25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT8 @@ public void testPlugin() throws Exception { // runtime should exist Assert.assertTrue(newCommandLine.containsKey("runtime")); } @@ -523,7 +523,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIG() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -562,7 +562,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test(expected = ContainerExecutionException.class) + public void testPluginMIGThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -586,7 +586,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -596,7 +596,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIGNoThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -620,7 +620,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -630,7 +630,7 @@ index b0b523360ef..798a95cb009 100644 + runCommand.getDockerCommandWithArguments(); + // NVIDIA_VISIBLE_DEVICES will be set + Assert.assertTrue( -+ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,1:2")); ++ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,25.06.25.06.1-SNAPSHOT:2")); + // runtime should exist + Assert.assertTrue(newCommandLine.containsKey("runtime")); + } diff --git a/examples/MIG-Support/resource-types/gpu-mig/yarn313to315MIG.patch b/examples/MIG-Support/resource-types/gpu-mig/yarn313to315MIG.patch index d5df6bd3b..57a10b95b 100644 --- a/examples/MIG-Support/resource-types/gpu-mig/yarn313to315MIG.patch +++ b/examples/MIG-Support/resource-types/gpu-mig/yarn313to315MIG.patch @@ -1,8 +1,8 @@ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -index 737baee70bb..0e113036a80 100644 +index 737baee70bb..0e25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT3036a80 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -@@ -1655,6 +1655,10 @@ public static boolean isAclEnabled(Configuration conf) { +@@ -25.06.25.06.1-SNAPSHOT655,6 +25.06.25.06.1-SNAPSHOT655,25.06.25.06.1-SNAPSHOT0 @@ public static boolean isAclEnabled(Configuration conf) { @Private public static final String AUTOMATICALLY_DISCOVER_GPU_DEVICES = "auto"; @@ -14,10 +14,10 @@ index 737baee70bb..0e113036a80 100644 * This setting controls where to how to invoke GPU binaries */ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -index 26fd9050742..e84b920dcee 100644 +index 26fd9050742..e84b920dcee 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -@@ -34,6 +34,12 @@ public AssignedGpuDevice(int index, int minorNumber, +@@ -34,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public AssignedGpuDevice(int index, int minorNumber, this.containerId = containerId.toString(); } @@ -38,7 +38,7 @@ index 26fd9050742..e84b920dcee 100644 && containerId.equals(other.containerId); } -@@ -68,12 +75,16 @@ public int compareTo(Object obj) { +@@ -68,25.06.25.06.1-SNAPSHOT2 +75,25.06.25.06.1-SNAPSHOT6 @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -58,18 +58,18 @@ index 26fd9050742..e84b920dcee 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java -index bce1d9fa480..3cb42d3c58f 100644 +index bce25.06.25.06.1-SNAPSHOTd9fa480..3cb42d3c58f 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java @@ -26,6 +26,7 @@ public class GpuDevice implements Serializable, Comparable { protected int index; protected int minorNumber; -+ protected int migDeviceIndex = -1; - private static final long serialVersionUID = -6812314470754667710L; ++ protected int migDeviceIndex = -25.06.25.06.1-SNAPSHOT; + private static final long serialVersionUID = -6825.06.25.06.1-SNAPSHOT2325.06.25.06.1-SNAPSHOT4470754667725.06.25.06.1-SNAPSHOT0L; public GpuDevice(int index, int minorNumber) { -@@ -33,6 +34,12 @@ public GpuDevice(int index, int minorNumber) { +@@ -33,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public GpuDevice(int index, int minorNumber) { this.minorNumber = minorNumber; } @@ -82,7 +82,7 @@ index bce1d9fa480..3cb42d3c58f 100644 public int getIndex() { return index; } -@@ -41,13 +48,17 @@ public int getMinorNumber() { +@@ -425.06.25.06.1-SNAPSHOT,25.06.25.06.1-SNAPSHOT3 +48,25.06.25.06.1-SNAPSHOT7 @@ public int getMinorNumber() { return minorNumber; } @@ -101,7 +101,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } @Override -@@ -62,17 +73,21 @@ public int compareTo(Object obj) { +@@ -62,25.06.25.06.1-SNAPSHOT7 +73,225.06.25.06.1-SNAPSHOT @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -127,7 +127,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java -index 9d61b91a1f2..d775aab0226 100644 +index 9d625.06.25.06.1-SNAPSHOTb925.06.25.06.1-SNAPSHOTa25.06.25.06.1-SNAPSHOTf2..d775aab0226 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java @@ -26,6 +26,8 @@ @@ -139,7 +139,7 @@ index 9d61b91a1f2..d775aab0226 100644 private GpuDeviceSpecificationException(String message) { super(message); -@@ -57,12 +59,25 @@ public static GpuDeviceSpecificationException createWithWrongValueSpecified( +@@ -57,25.06.25.06.1-SNAPSHOT2 +59,25 @@ public static GpuDeviceSpecificationException createWithWrongValueSpecified( return new GpuDeviceSpecificationException(message); } @@ -173,10 +173,10 @@ index 9d61b91a1f2..d775aab0226 100644 \ No newline at end of file +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java -index ce767229e50..c74651b41df 100644 +index ce767229e50..c746525.06.25.06.1-SNAPSHOTb425.06.25.06.1-SNAPSHOTdf 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java -@@ -31,6 +31,7 @@ +@@ -325.06.25.06.1-SNAPSHOT,6 +325.06.25.06.1-SNAPSHOT,7 @@ import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation; import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformationParser; import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.PerGpuDeviceInformation; @@ -192,7 +192,7 @@ index ce767229e50..c74651b41df 100644 private void validateConfOrThrowException() throws YarnException { if (conf == null) { -@@ -194,8 +196,17 @@ private boolean IsAutoDiscoveryEnabled() { +@@ -25.06.25.06.1-SNAPSHOT94,8 +25.06.25.06.1-SNAPSHOT96,25.06.25.06.1-SNAPSHOT7 @@ private boolean IsAutoDiscoveryEnabled() { for (int i = 0; i < numberOfGpus; i++) { List gpuInfos = lastDiscoveredGpuInformation.getGpus(); @@ -211,7 +211,7 @@ index ce767229e50..c74651b41df 100644 } return gpuDevices; } -@@ -218,18 +229,39 @@ private boolean IsAutoDiscoveryEnabled() { +@@ -225.06.25.06.1-SNAPSHOT8,25.06.25.06.1-SNAPSHOT8 +229,39 @@ private boolean IsAutoDiscoveryEnabled() { for (String device : devices.split(",")) { if (device.trim().length() > 0) { String[] splitByColon = device.trim().split(":"); @@ -261,7 +261,7 @@ index ce767229e50..c74651b41df 100644 } } LOG.info("Allowed GPU devices:" + gpuDevices); -@@ -237,6 +269,19 @@ private boolean IsAutoDiscoveryEnabled() { +@@ -237,6 +269,25.06.25.06.1-SNAPSHOT9 @@ private boolean IsAutoDiscoveryEnabled() { return gpuDevices; } @@ -269,7 +269,7 @@ index ce767229e50..c74651b41df 100644 + String allowedDevicesStr) throws YarnException { + try { + int index = Integer.parseInt(splitByColon[0]); -+ int minorNumber = Integer.parseInt(splitByColon[1]); ++ int minorNumber = Integer.parseInt(splitByColon[25.06.25.06.1-SNAPSHOT]); + int migIndex = Integer.parseInt(splitByColon[2]); + return new GpuDevice(index, minorNumber, migIndex); + } catch (NumberFormatException e) { @@ -281,7 +281,7 @@ index ce767229e50..c74651b41df 100644 private GpuDevice parseGpuDevice(String device, String[] splitByColon, String allowedDevicesStr) throws YarnException { try { -@@ -268,6 +313,9 @@ public synchronized void initialize(Configuration config) +@@ -268,6 +325.06.25.06.1-SNAPSHOT3,9 @@ public synchronized void initialize(Configuration config) LOG.warn(msg); } } @@ -292,7 +292,7 @@ index ce767229e50..c74651b41df 100644 private void lookUpAutoDiscoveryBinary(Configuration config) diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java -index 051afd6c561..996cb58ac45 100644 +index 0525.06.25.06.1-SNAPSHOTafd6c5625.06.25.06.1-SNAPSHOT..996cb58ac45 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java @@ -36,7 +36,7 @@ public static DockerCommandPlugin createGpuDockerCommandPlugin( @@ -305,10 +305,10 @@ index 051afd6c561..996cb58ac45 100644 throw new YarnException( diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -index ff25eb6ced6..c2cc0e5a2d1 100644 +index ff25eb6ced6..c2cc0e5a2d25.06.25.06.1-SNAPSHOT 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -@@ -21,7 +21,9 @@ +@@ -225.06.25.06.1-SNAPSHOT,7 +225.06.25.06.1-SNAPSHOT,9 @@ import com.google.common.annotations.VisibleForTesting; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; @@ -318,7 +318,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container; import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings; import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator; -@@ -45,8 +47,12 @@ +@@ -45,8 +47,25.06.25.06.1-SNAPSHOT2 @@ private String nvidiaRuntime = "nvidia"; private String nvidiaVisibleDevices = "NVIDIA_VISIBLE_DEVICES"; @@ -332,16 +332,16 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 private Set getAssignedGpus(Container container) { ResourceMappings resourceMappings = container.getResourceMappings(); -@@ -84,10 +90,23 @@ public synchronized void updateDockerRunCommand( +@@ -84,25.06.25.06.1-SNAPSHOT0 +90,23 @@ public synchronized void updateDockerRunCommand( return; } Map environment = new HashMap<>(); -+ if (isMigEnabled && assignedResources.size() > 1) { ++ if (isMigEnabled && assignedResources.size() > 25.06.25.06.1-SNAPSHOT) { + Map existingEnv = container.getLaunchContext().getEnvironment(); + Boolean shouldThrowOnMultipleGpus = Boolean.parseBoolean( + existingEnv.getOrDefault(nvidiaMigThrowOnMultiGpus, "true")); + if (shouldThrowOnMultipleGpus) { -+ throw new ContainerExecutionException("Allocating more than 1 GPU per container is " + ++ throw new ContainerExecutionException("Allocating more than 25.06.25.06.1-SNAPSHOT GPU per container is " + + "not supported with use of MIG!"); + } + } @@ -350,7 +350,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 - gpuIndexList = gpuIndexList + gpuDevice.getIndex() + ","; - LOG.info("nvidia docker2 assigned gpu index: " + gpuDevice.getIndex()); + String deviceIndex = String.valueOf(gpuDevice.getIndex()); -+ if (gpuDevice.getMIGIndex() != -1) { ++ if (gpuDevice.getMIGIndex() != -25.06.25.06.1-SNAPSHOT) { + deviceIndex = gpuDevice.getIndex() + ":" + gpuDevice.getMIGIndex(); + } + gpuIndexList = gpuIndexList + deviceIndex + ","; @@ -359,10 +359,10 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 dockerRunCommand.addRuntime(nvidiaRuntime); environment.put(nvidiaVisibleDevices, diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -index 11ff2a4c49c..939ed46aac7 100644 +index 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOTff2a4c49c..939ed46aac7 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -@@ -22,8 +22,10 @@ +@@ -22,8 +22,25.06.25.06.1-SNAPSHOT0 @@ import org.apache.hadoop.classification.InterfaceStability; import javax.xml.bind.annotation.XmlElement; @@ -375,14 +375,14 @@ index 11ff2a4c49c..939ed46aac7 100644 * Capture single GPU device information such as memory size, temperature, @@ -37,6 +39,8 @@ private String uuid = "N/A"; - private int minorNumber = -1; + private int minorNumber = -25.06.25.06.1-SNAPSHOT; + private List migDevices; + private PerGpuMigMode migMode; private PerGpuUtilizations gpuUtilizations; private PerGpuMemoryUsage gpuMemoryUsage; private PerGpuTemperature temperature; -@@ -107,6 +111,25 @@ public void setUuid(String uuid) { +@@ -25.06.25.06.1-SNAPSHOT07,6 +25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT,25 @@ public void setUuid(String uuid) { this.uuid = uuid; } @@ -409,11 +409,11 @@ index 11ff2a4c49c..939ed46aac7 100644 public String getProductName() { return productName; diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..4ce7cec6e55 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -463,11 +463,11 @@ index 00000000000..4ce7cec6e55 + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..b706df2c3bb --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -517,7 +517,7 @@ index 00000000000..b706df2c3bb + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java -index f0f100c1f8b..02b213b6734 100644 +index f0f25.06.25.06.1-SNAPSHOT00c25.06.25.06.1-SNAPSHOTf8b..02b225.06.25.06.1-SNAPSHOT3b6734 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java @@ -372,6 +372,37 @@ public void testGetNumberOfUsableGpusFromConfig() throws YarnException { @@ -526,7 +526,7 @@ index f0f100c1f8b..02b213b6734 100644 + @Test + public void testGetNumberOfUsableGpusFromConfigMIG() throws YarnException { -+ Configuration conf = createConfigWithAllowedDevices("0:0,1:1:0,1:1:3,2:2,3:4"); ++ Configuration conf = createConfigWithAllowedDevices("0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:3,2:2,3:4"); + conf.set(YarnConfiguration.USE_MIG_ENABLED_GPUS, "true"); + GpuDiscoverer discoverer = new GpuDiscoverer(); + discoverer.initialize(conf); @@ -536,39 +536,39 @@ index f0f100c1f8b..02b213b6734 100644 + + assertEquals(0, usableGpuDevices.get(0).getIndex()); + assertEquals(0, usableGpuDevices.get(0).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(0).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(0).getMIGIndex()); + -+ assertEquals(1, usableGpuDevices.get(1).getIndex()); -+ assertEquals(1, usableGpuDevices.get(1).getMinorNumber()); -+ assertEquals(0, usableGpuDevices.get(1).getMIGIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMinorNumber()); ++ assertEquals(0, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMIGIndex()); + -+ assertEquals(1, usableGpuDevices.get(2).getIndex()); -+ assertEquals(1, usableGpuDevices.get(2).getMinorNumber()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(2).getIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(2).getMinorNumber()); + assertEquals(3, usableGpuDevices.get(2).getMIGIndex()); + + assertEquals(2, usableGpuDevices.get(3).getIndex()); + assertEquals(2, usableGpuDevices.get(3).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(3).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(3).getMIGIndex()); + + assertEquals(3, usableGpuDevices.get(4).getIndex()); + assertEquals(4, usableGpuDevices.get(4).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(4).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(4).getMIGIndex()); + } + @Test public void testGetNumberOfUsableGpusFromConfigDuplicateValues() throws YarnException { -@@ -512,4 +543,5 @@ public void testScriptNotCalled() throws YarnException { +@@ -525.06.25.06.1-SNAPSHOT2,4 +543,5 @@ public void testScriptNotCalled() throws YarnException { verify(gpuSpy, never()).getGpuDeviceInformation(); } + } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -index b0b523360ef..798a95cb009 100644 +index b0b523360ef..798a95cb009 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -@@ -20,10 +20,14 @@ +@@ -20,25.06.25.06.1-SNAPSHOT0 +20,25.06.25.06.1-SNAPSHOT4 @@ import com.google.common.collect.ImmutableList; import com.google.common.collect.Sets; @@ -583,7 +583,7 @@ index b0b523360ef..798a95cb009 100644 import org.junit.Assert; import org.junit.Test; -@@ -69,7 +73,13 @@ private boolean commandlinesEquals(Map> cli1, +@@ -69,7 +73,25.06.25.06.1-SNAPSHOT3 @@ private boolean commandlinesEquals(Map> cli25.06.25.06.1-SNAPSHOT, extends NvidiaDockerV2CommandPlugin { private boolean requestsGpu = false; @@ -598,7 +598,7 @@ index b0b523360ef..798a95cb009 100644 public void setRequestsGpu(boolean r) { requestsGpu = r; -@@ -127,4 +137,118 @@ public void testPlugin() throws Exception { +@@ -25.06.25.06.1-SNAPSHOT27,4 +25.06.25.06.1-SNAPSHOT37,25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT8 @@ public void testPlugin() throws Exception { // runtime should exist Assert.assertTrue(newCommandLine.containsKey("runtime")); } @@ -607,7 +607,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIG() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -646,7 +646,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test(expected = ContainerExecutionException.class) + public void testPluginMIGThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -670,7 +670,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -680,7 +680,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIGNoThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -704,7 +704,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -714,7 +714,7 @@ index b0b523360ef..798a95cb009 100644 + runCommand.getDockerCommandWithArguments(); + // NVIDIA_VISIBLE_DEVICES will be set + Assert.assertTrue( -+ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,1:2")); ++ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,25.06.25.06.1-SNAPSHOT:2")); + // runtime should exist + Assert.assertTrue(newCommandLine.containsKey("runtime")); + } diff --git a/examples/MIG-Support/resource-types/gpu-mig/yarn321to323MIG.patch b/examples/MIG-Support/resource-types/gpu-mig/yarn321to323MIG.patch index a9edb966f..6e558e32e 100644 --- a/examples/MIG-Support/resource-types/gpu-mig/yarn321to323MIG.patch +++ b/examples/MIG-Support/resource-types/gpu-mig/yarn321to323MIG.patch @@ -1,8 +1,8 @@ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -index ad4d87daa1a..95259b1d956 100644 +index ad4d87daa25.06.25.06.1-SNAPSHOTa..95259b25.06.25.06.1-SNAPSHOTd956 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -@@ -1716,6 +1716,10 @@ public static boolean isAclEnabled(Configuration conf) { +@@ -25.06.25.06.1-SNAPSHOT725.06.25.06.1-SNAPSHOT6,6 +25.06.25.06.1-SNAPSHOT725.06.25.06.1-SNAPSHOT6,25.06.25.06.1-SNAPSHOT0 @@ public static boolean isAclEnabled(Configuration conf) { @Private public static final String AUTOMATICALLY_DISCOVER_GPU_DEVICES = "auto"; @@ -14,10 +14,10 @@ index ad4d87daa1a..95259b1d956 100644 * This setting controls where to how to invoke GPU binaries */ diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -index 26fd9050742..e84b920dcee 100644 +index 26fd9050742..e84b920dcee 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java -@@ -34,6 +34,12 @@ public AssignedGpuDevice(int index, int minorNumber, +@@ -34,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public AssignedGpuDevice(int index, int minorNumber, this.containerId = containerId.toString(); } @@ -38,7 +38,7 @@ index 26fd9050742..e84b920dcee 100644 && containerId.equals(other.containerId); } -@@ -68,12 +75,16 @@ public int compareTo(Object obj) { +@@ -68,25.06.25.06.1-SNAPSHOT2 +75,25.06.25.06.1-SNAPSHOT6 @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -58,18 +58,18 @@ index 26fd9050742..e84b920dcee 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java -index bce1d9fa480..3cb42d3c58f 100644 +index bce25.06.25.06.1-SNAPSHOTd9fa480..3cb42d3c58f 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java @@ -26,6 +26,7 @@ public class GpuDevice implements Serializable, Comparable { protected int index; protected int minorNumber; -+ protected int migDeviceIndex = -1; - private static final long serialVersionUID = -6812314470754667710L; ++ protected int migDeviceIndex = -25.06.25.06.1-SNAPSHOT; + private static final long serialVersionUID = -6825.06.25.06.1-SNAPSHOT2325.06.25.06.1-SNAPSHOT4470754667725.06.25.06.1-SNAPSHOT0L; public GpuDevice(int index, int minorNumber) { -@@ -33,6 +34,12 @@ public GpuDevice(int index, int minorNumber) { +@@ -33,6 +34,25.06.25.06.1-SNAPSHOT2 @@ public GpuDevice(int index, int minorNumber) { this.minorNumber = minorNumber; } @@ -82,7 +82,7 @@ index bce1d9fa480..3cb42d3c58f 100644 public int getIndex() { return index; } -@@ -41,13 +48,17 @@ public int getMinorNumber() { +@@ -425.06.25.06.1-SNAPSHOT,25.06.25.06.1-SNAPSHOT3 +48,25.06.25.06.1-SNAPSHOT7 @@ public int getMinorNumber() { return minorNumber; } @@ -101,7 +101,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } @Override -@@ -62,17 +73,21 @@ public int compareTo(Object obj) { +@@ -62,25.06.25.06.1-SNAPSHOT7 +73,225.06.25.06.1-SNAPSHOT @@ public int compareTo(Object obj) { if (0 != result) { return result; } @@ -127,7 +127,7 @@ index bce1d9fa480..3cb42d3c58f 100644 } } diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java -index 9d61b91a1f2..ffc2a4c19af 100644 +index 9d625.06.25.06.1-SNAPSHOTb925.06.25.06.1-SNAPSHOTa25.06.25.06.1-SNAPSHOTf2..ffc2a4c25.06.25.06.1-SNAPSHOT9af 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDeviceSpecificationException.java @@ -26,6 +26,8 @@ @@ -139,7 +139,7 @@ index 9d61b91a1f2..ffc2a4c19af 100644 private GpuDeviceSpecificationException(String message) { super(message); -@@ -57,12 +59,31 @@ public static GpuDeviceSpecificationException createWithWrongValueSpecified( +@@ -57,25.06.25.06.1-SNAPSHOT2 +59,325.06.25.06.1-SNAPSHOT @@ public static GpuDeviceSpecificationException createWithWrongValueSpecified( return new GpuDeviceSpecificationException(message); } @@ -171,7 +171,7 @@ index 9d61b91a1f2..ffc2a4c19af 100644 private static String createIllegalFormatMessage(String device, String configValue) { return String.format("Illegal format of individual GPU device: %s, " + -@@ -79,4 +100,4 @@ private static String createDuplicateFormatMessage(String device, +@@ -79,4 +25.06.25.06.1-SNAPSHOT00,4 @@ private static String createDuplicateFormatMessage(String device, "! Current value of the configuration is: %s", device, configValue); } @@ -179,7 +179,7 @@ index 9d61b91a1f2..ffc2a4c19af 100644 \ No newline at end of file +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java -index f710ff0bccd..1517e12599a 100644 +index f725.06.25.06.1-SNAPSHOT0ff0bccd..25.06.25.06.1-SNAPSHOT525.06.25.06.1-SNAPSHOT7e25.06.25.06.1-SNAPSHOT2599a 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java @@ -36,6 +36,7 @@ @@ -190,7 +190,7 @@ index f710ff0bccd..1517e12599a 100644 import org.slf4j.Logger; import org.slf4j.LoggerFactory; -@@ -70,6 +71,7 @@ +@@ -70,6 +725.06.25.06.1-SNAPSHOT,7 @@ private GpuDeviceInformation lastDiscoveredGpuInformation = null; private List gpuDevicesFromUser; @@ -198,7 +198,7 @@ index f710ff0bccd..1517e12599a 100644 private void validateConfOrThrowException() throws YarnException { if (conf == null) { -@@ -188,8 +190,17 @@ private boolean isAutoDiscoveryEnabled() { +@@ -25.06.25.06.1-SNAPSHOT88,8 +25.06.25.06.1-SNAPSHOT90,25.06.25.06.1-SNAPSHOT7 @@ private boolean isAutoDiscoveryEnabled() { for (int i = 0; i < numberOfGpus; i++) { List gpuInfos = lastDiscoveredGpuInformation.getGpus(); @@ -217,7 +217,7 @@ index f710ff0bccd..1517e12599a 100644 } return gpuDevices; } -@@ -212,28 +223,56 @@ private boolean isAutoDiscoveryEnabled() { +@@ -225.06.25.06.1-SNAPSHOT2,28 +223,56 @@ private boolean isAutoDiscoveryEnabled() { for (String device : devices.split(",")) { if (device.trim().length() > 0) { String[] splitByColon = device.trim().split(":"); @@ -293,20 +293,20 @@ index f710ff0bccd..1517e12599a 100644 } } } -@@ -248,6 +287,12 @@ private GpuDevice parseGpuDevice(String[] splitByColon) { +@@ -248,6 +287,25.06.25.06.1-SNAPSHOT2 @@ private GpuDevice parseGpuDevice(String[] splitByColon) { return new GpuDevice(index, minorNumber); } + private GpuDevice parseGpuMIGDevice(String[] splitByColon) { + int index = Integer.parseInt(splitByColon[0]); -+ int minorNumber = Integer.parseInt(splitByColon[1]); ++ int minorNumber = Integer.parseInt(splitByColon[25.06.25.06.1-SNAPSHOT]); + int migIndex = Integer.parseInt(splitByColon[2]); + return new GpuDevice(index, minorNumber, migIndex); + } public synchronized void initialize(Configuration config, NvidiaBinaryHelper nvidiaHelper) throws YarnException { -@@ -269,6 +314,9 @@ public synchronized void initialize(Configuration config, +@@ -269,6 +325.06.25.06.1-SNAPSHOT4,9 @@ public synchronized void initialize(Configuration config, LOG.warn(msg); } } @@ -317,7 +317,7 @@ index f710ff0bccd..1517e12599a 100644 private void lookUpAutoDiscoveryBinary(Configuration config) diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java -index 051afd6c561..996cb58ac45 100644 +index 0525.06.25.06.1-SNAPSHOTafd6c5625.06.25.06.1-SNAPSHOT..996cb58ac45 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDockerCommandPluginFactory.java @@ -36,7 +36,7 @@ public static DockerCommandPlugin createGpuDockerCommandPlugin( @@ -330,10 +330,10 @@ index 051afd6c561..996cb58ac45 100644 throw new YarnException( diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -index ff25eb6ced6..c2cc0e5a2d1 100644 +index ff25eb6ced6..c2cc0e5a2d25.06.25.06.1-SNAPSHOT 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/NvidiaDockerV2CommandPlugin.java -@@ -21,7 +21,9 @@ +@@ -225.06.25.06.1-SNAPSHOT,7 +225.06.25.06.1-SNAPSHOT,9 @@ import com.google.common.annotations.VisibleForTesting; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; @@ -343,7 +343,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container; import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings; import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator; -@@ -45,8 +47,12 @@ +@@ -45,8 +47,25.06.25.06.1-SNAPSHOT2 @@ private String nvidiaRuntime = "nvidia"; private String nvidiaVisibleDevices = "NVIDIA_VISIBLE_DEVICES"; @@ -357,16 +357,16 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 private Set getAssignedGpus(Container container) { ResourceMappings resourceMappings = container.getResourceMappings(); -@@ -84,10 +90,23 @@ public synchronized void updateDockerRunCommand( +@@ -84,25.06.25.06.1-SNAPSHOT0 +90,23 @@ public synchronized void updateDockerRunCommand( return; } Map environment = new HashMap<>(); -+ if (isMigEnabled && assignedResources.size() > 1) { ++ if (isMigEnabled && assignedResources.size() > 25.06.25.06.1-SNAPSHOT) { + Map existingEnv = container.getLaunchContext().getEnvironment(); + Boolean shouldThrowOnMultipleGpus = Boolean.parseBoolean( + existingEnv.getOrDefault(nvidiaMigThrowOnMultiGpus, "true")); + if (shouldThrowOnMultipleGpus) { -+ throw new ContainerExecutionException("Allocating more than 1 GPU per container is " + ++ throw new ContainerExecutionException("Allocating more than 25.06.25.06.1-SNAPSHOT GPU per container is " + + "not supported with use of MIG!"); + } + } @@ -375,7 +375,7 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 - gpuIndexList = gpuIndexList + gpuDevice.getIndex() + ","; - LOG.info("nvidia docker2 assigned gpu index: " + gpuDevice.getIndex()); + String deviceIndex = String.valueOf(gpuDevice.getIndex()); -+ if (gpuDevice.getMIGIndex() != -1) { ++ if (gpuDevice.getMIGIndex() != -25.06.25.06.1-SNAPSHOT) { + deviceIndex = gpuDevice.getIndex() + ":" + gpuDevice.getMIGIndex(); + } + gpuIndexList = gpuIndexList + deviceIndex + ","; @@ -384,10 +384,10 @@ index ff25eb6ced6..c2cc0e5a2d1 100644 dockerRunCommand.addRuntime(nvidiaRuntime); environment.put(nvidiaVisibleDevices, diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -index 11ff2a4c49c..939ed46aac7 100644 +index 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOTff2a4c49c..939ed46aac7 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java -@@ -22,8 +22,10 @@ +@@ -22,8 +22,25.06.25.06.1-SNAPSHOT0 @@ import org.apache.hadoop.classification.InterfaceStability; import javax.xml.bind.annotation.XmlElement; @@ -400,14 +400,14 @@ index 11ff2a4c49c..939ed46aac7 100644 * Capture single GPU device information such as memory size, temperature, @@ -37,6 +39,8 @@ private String uuid = "N/A"; - private int minorNumber = -1; + private int minorNumber = -25.06.25.06.1-SNAPSHOT; + private List migDevices; + private PerGpuMigMode migMode; private PerGpuUtilizations gpuUtilizations; private PerGpuMemoryUsage gpuMemoryUsage; private PerGpuTemperature temperature; -@@ -107,6 +111,25 @@ public void setUuid(String uuid) { +@@ -25.06.25.06.1-SNAPSHOT07,6 +25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT,25 @@ public void setUuid(String uuid) { this.uuid = uuid; } @@ -434,11 +434,11 @@ index 11ff2a4c49c..939ed46aac7 100644 public String getProductName() { return productName; diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..4ce7cec6e55 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigDevice.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -488,11 +488,11 @@ index 00000000000..4ce7cec6e55 + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -new file mode 100644 +new file mode 25.06.25.06.1-SNAPSHOT00644 index 00000000000..b706df2c3bb --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMigMode.java -@@ -0,0 +1,48 @@ +@@ -0,0 +25.06.25.06.1-SNAPSHOT,48 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file @@ -542,7 +542,7 @@ index 00000000000..b706df2c3bb + } +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java -index 8261895b2a9..6c1f500009c 100644 +index 82625.06.25.06.1-SNAPSHOT895b2a9..6c25.06.25.06.1-SNAPSHOTf500009c 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java @@ -373,6 +373,37 @@ public void testGetNumberOfUsableGpusFromConfig() throws YarnException { @@ -551,7 +551,7 @@ index 8261895b2a9..6c1f500009c 100644 + @Test + public void testGetNumberOfUsableGpusFromConfigMIG() throws YarnException { -+ Configuration conf = createConfigWithAllowedDevices("0:0,1:1:0,1:1:3,2:2,3:4"); ++ Configuration conf = createConfigWithAllowedDevices("0:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:0,25.06.25.06.1-SNAPSHOT:25.06.25.06.1-SNAPSHOT:3,2:2,3:4"); + conf.set(YarnConfiguration.USE_MIG_ENABLED_GPUS, "true"); + GpuDiscoverer discoverer = new GpuDiscoverer(); + discoverer.initialize(conf, binaryHelper); @@ -561,29 +561,29 @@ index 8261895b2a9..6c1f500009c 100644 + + assertEquals(0, usableGpuDevices.get(0).getIndex()); + assertEquals(0, usableGpuDevices.get(0).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(0).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(0).getMIGIndex()); + -+ assertEquals(1, usableGpuDevices.get(1).getIndex()); -+ assertEquals(1, usableGpuDevices.get(1).getMinorNumber()); -+ assertEquals(0, usableGpuDevices.get(1).getMIGIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMinorNumber()); ++ assertEquals(0, usableGpuDevices.get(25.06.25.06.1-SNAPSHOT).getMIGIndex()); + -+ assertEquals(1, usableGpuDevices.get(2).getIndex()); -+ assertEquals(1, usableGpuDevices.get(2).getMinorNumber()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(2).getIndex()); ++ assertEquals(25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(2).getMinorNumber()); + assertEquals(3, usableGpuDevices.get(2).getMIGIndex()); + + assertEquals(2, usableGpuDevices.get(3).getIndex()); + assertEquals(2, usableGpuDevices.get(3).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(3).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(3).getMIGIndex()); + + assertEquals(3, usableGpuDevices.get(4).getIndex()); + assertEquals(4, usableGpuDevices.get(4).getMinorNumber()); -+ assertEquals(-1, usableGpuDevices.get(4).getMIGIndex()); ++ assertEquals(-25.06.25.06.1-SNAPSHOT, usableGpuDevices.get(4).getMIGIndex()); + } + @Test public void testGetNumberOfUsableGpusFromConfigDuplicateValues() throws YarnException { -@@ -513,4 +544,4 @@ public void testScriptNotCalled() throws YarnException, IOException { +@@ -525.06.25.06.1-SNAPSHOT3,4 +544,4 @@ public void testScriptNotCalled() throws YarnException, IOException { verify(gpuSpy, never()).getGpuDeviceInformation(); } @@ -591,10 +591,10 @@ index 8261895b2a9..6c1f500009c 100644 \ No newline at end of file +} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -index b0b523360ef..798a95cb009 100644 +index b0b523360ef..798a95cb009 25.06.25.06.1-SNAPSHOT00644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV2CommandPlugin.java -@@ -20,10 +20,14 @@ +@@ -20,25.06.25.06.1-SNAPSHOT0 +20,25.06.25.06.1-SNAPSHOT4 @@ import com.google.common.collect.ImmutableList; import com.google.common.collect.Sets; @@ -609,7 +609,7 @@ index b0b523360ef..798a95cb009 100644 import org.junit.Assert; import org.junit.Test; -@@ -69,7 +73,13 @@ private boolean commandlinesEquals(Map> cli1, +@@ -69,7 +73,25.06.25.06.1-SNAPSHOT3 @@ private boolean commandlinesEquals(Map> cli25.06.25.06.1-SNAPSHOT, extends NvidiaDockerV2CommandPlugin { private boolean requestsGpu = false; @@ -624,7 +624,7 @@ index b0b523360ef..798a95cb009 100644 public void setRequestsGpu(boolean r) { requestsGpu = r; -@@ -127,4 +137,118 @@ public void testPlugin() throws Exception { +@@ -25.06.25.06.1-SNAPSHOT27,4 +25.06.25.06.1-SNAPSHOT37,25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT8 @@ public void testPlugin() throws Exception { // runtime should exist Assert.assertTrue(newCommandLine.containsKey("runtime")); } @@ -633,7 +633,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIG() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -672,7 +672,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test(expected = ContainerExecutionException.class) + public void testPluginMIGThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -696,7 +696,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -706,7 +706,7 @@ index b0b523360ef..798a95cb009 100644 + + @Test + public void testPluginMIGNoThrowsMulti() throws Exception { -+ DockerRunCommand runCommand = new DockerRunCommand("container_1", "user", ++ DockerRunCommand runCommand = new DockerRunCommand("container_25.06.25.06.1-SNAPSHOT", "user", + "fakeimage"); + + Map> originalCommandline = copyCommandLine( @@ -730,7 +730,7 @@ index b0b523360ef..798a95cb009 100644 + ResourceMappings.AssignedResources assigned = + new ResourceMappings.AssignedResources(); + assigned.updateAssignedResources( -+ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(1, 1, 2))); ++ ImmutableList.of(new GpuDevice(0, 0, 0), new GpuDevice(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 2))); + resourceMappings.addAssignedResources(ResourceInformation.GPU_URI, + assigned); + @@ -740,7 +740,7 @@ index b0b523360ef..798a95cb009 100644 + runCommand.getDockerCommandWithArguments(); + // NVIDIA_VISIBLE_DEVICES will be set + Assert.assertTrue( -+ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,1:2")); ++ runCommand.getEnv().get("NVIDIA_VISIBLE_DEVICES").equals("0:0,25.06.25.06.1-SNAPSHOT:2")); + // runtime should exist + Assert.assertTrue(newCommandLine.containsKey("runtime")); + } diff --git a/examples/MIG-Support/yarn-unpatched/README.md b/examples/MIG-Support/yarn-unpatched/README.md index 6a4d7e9b3..b8e87a37e 100644 --- a/examples/MIG-Support/yarn-unpatched/README.md +++ b/examples/MIG-Support/yarn-unpatched/README.md @@ -1,4 +1,4 @@ -# MIG Support for Spark on YARN using unmodified versions of Apache Hadoop 3.1.2+ +# MIG Support for Spark on YARN using unmodified versions of Apache Hadoop 3.25.06.25.06.1-SNAPSHOT.2+ This document describes a solution for utilizing MIG with YARN when upgrading to a recent 3.3+ version or patching older versions of Apache Hadoop is not feasible. Please refer to the corresponding @@ -22,13 +22,13 @@ to discover GPUs. It replaces MIG-enabled GPUs with the list of `` elements Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations) and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices). -Special note, that this method only works with drivers >= R470 (470.42.01+). +Special note, that this method only works with drivers >= R470 (470.42.025.06.25.06.1-SNAPSHOT+). ## Installation These instructions assume YARN is already installed and configured with GPU Scheduling enabled using Docker and the NVIDIA Container Toolkit (nvidia-docker2). -See [Using GPU on YARN](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html) if +See [Using GPU on YARN](https://hadoop.apache.org/docs/r3.25.06.25.06.1-SNAPSHOT.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html) if you need more information. Enable and configure your [GPUs with MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) on all of the nodes @@ -37,18 +37,18 @@ it applies to. Download the contents of [scripts](./scripts/) to every YARN NodeManager (worker) machine to some location, for example: `/usr/local/yarn-mig-scripts`. Make sure that the scripts are executable by the docker daemon user (i.e., `root`), and YARN NM service user (typically `yarn`). Note that the scripts -leave the original outputs untouched if the environment variable `MIG_AS_GPU_ENABLED` is not 1. +leave the original outputs untouched if the environment variable `MIG_AS_GPU_ENABLED` is not 25.06.25.06.1-SNAPSHOT. ### YARN Configuration #### Customizing yarn-env.sh In `$YARN_CONF_DIR/yarn-env.sh` -- Add `export MIG_AS_GPU_ENABLED=1` to enable replacing of MIG-enabled GPUs with a list +- Add `export MIG_AS_GPU_ENABLED=25.06.25.06.1-SNAPSHOT` to enable replacing of MIG-enabled GPUs with a list of of MIG devices as if they are physical GPU. - Customize `REAL_NVIDIA_SMI_PATH` value if nvidia-smi is not at the default location `/usr/bin/nvidia-smi`. - Add `ENABLE_NON_MIG_GPUS=0` if you want to prevent discovery of physical GPUs that are not subdivided in MIGs. -Default is ENABLE_NON_MIG_GPUS=1 and physical GPUs in the MIG-Disabled state are listed along with MIG sub-devices on the node. +Default is ENABLE_NON_MIG_GPUS=25.06.25.06.1-SNAPSHOT and physical GPUs in the MIG-Disabled state are listed along with MIG sub-devices on the node. Modify the following config `$YARN_CONF_DIR/yarn-site.xml`: ```xml @@ -67,10 +67,10 @@ specify the list of MIG instances to use by setting 0-based indices corresponding to the desired `` elements in the output of ```bash -MIG_AS_GPU_ENABLED=1 /usr/local/yarn-mig-scripts/nvidia-smi -q -x +MIG_AS_GPU_ENABLED=25.06.25.06.1-SNAPSHOT /usr/local/yarn-mig-scripts/nvidia-smi -q -x ``` -In other words, if you want to allow MIG 1:2 and 2:0 and they are listed as 3rd and 5th `` +In other words, if you want to allow MIG 25.06.25.06.1-SNAPSHOT:2 and 2:0 and they are listed as 3rd and 5th `` elements the value for `yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices` should be "2,4". @@ -79,7 +79,7 @@ elements the value for `yarn.nodemanager.resource-plugins.gpu.allowed-gpu-device Modify section `[nvidia-container-cli]` in `/etc/nvidia-container-runtime/config.toml`: ```toml path = "/usr/local/yarn-mig-scripts/nvidia-container-cli-wrapper.sh" -environment = [ "MIG_AS_GPU_ENABLED=1", "REAL_NVIDIA_SMI_PATH=/if/non-default/path/nvidia-smi" ] +environment = [ "MIG_AS_GPU_ENABLED=25.06.25.06.1-SNAPSHOT", "REAL_NVIDIA_SMI_PATH=/if/non-default/path/nvidia-smi" ] ``` Note, the values for `MIG_AS_GPU_ENABLED`, `REAL_NVIDIA_SMI_PATH`, `ENABLE_NON_MIG_GPUS` should be diff --git a/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh b/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh index c95b195b6..c5827caf9 100755 --- a/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh +++ b/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2022, NVIDIA CORPORATION. +# Copyright (c) 2022-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ set -e # pretty-printed XML output generated by nvidia-smi. It replaces the a each MIG-enabled gpu element with # with a list of gpu elements corresponding to its configured MIG devices. # If there is at least one MIG-enabled GPU, the output for non-MIG GPUs is suppressed by default. However, -# this can be overridden using ENABLE_NON_MIG_GPUS=1. +# this can be overridden using ENABLE_NON_MIG_GPUS=25.06.25.06.1-SNAPSHOT. # XML fragments are viewed and manipuated using bash arrays of lines. Each elmenent of interest is tracked by # a start offset into the line array pointing to the the line with the opening tag and the end offset, @@ -31,7 +31,7 @@ set -e # Include both MIG and non-MIG devices by default # Set ENABLE_NON_MIG_GPUS=0 to discover only GPU devices with the current MIG mode Disabled -ENABLE_NON_MIG_GPUS=${ENABLE_NON_MIG_GPUS:-1} +ENABLE_NON_MIG_GPUS=${ENABLE_NON_MIG_GPUS:-25.06.25.06.1-SNAPSHOT} # If setting YARN up to use Cgroups without official YARN support, # enabling this tells the script to use the NVIDIA capabilities access @@ -57,25 +57,25 @@ mig2gpu_nonMigGpu_out=() mig2gpu_migGpu_out=() # Slice of original XML defining the current GPU element -mig2gpu_gpu_lineNumberStart=-1 -mig2gpu_gpu_lineNumberEnd=-1 +mig2gpu_gpu_lineNumberStart=-25.06.25.06.1-SNAPSHOT +mig2gpu_gpu_lineNumberEnd=-25.06.25.06.1-SNAPSHOT # Slice of original XML defining the current MIG element -mig2gpu_mig_lineNumberStart=-1 -mig2gpu_mig_lineNumberEnd=-1 -mig2gpu_migIndex=-1 +mig2gpu_mig_lineNumberStart=-25.06.25.06.1-SNAPSHOT +mig2gpu_mig_lineNumberEnd=-25.06.25.06.1-SNAPSHOT +mig2gpu_migIndex=-25.06.25.06.1-SNAPSHOT # Parent GPU context for MIG -mig2gpu_gpuIdx=-1 -mig2gpu_migGpuInstanceId=-1 -mig2gpu_migComputeInstanceUuid=-1 +mig2gpu_gpuIdx=-25.06.25.06.1-SNAPSHOT +mig2gpu_migGpuInstanceId=-25.06.25.06.1-SNAPSHOT +mig2gpu_migComputeInstanceUuid=-25.06.25.06.1-SNAPSHOT mig2gpu_productName="INVALID_GPU_PRODUCT_NAME" mig2gpu_gpuUuid="INVALID_GPU_UUID" mig2gpu_gpuMinorNumber="INVALID_GPU_MINOR_NUMBER" -mig2gpu_gpu_utilization_lineNumberStart=-1 -mig2gpu_gpu_utilization_lineNumberEnd=-1 -mig2gpu_gpu_temperature_lineNumberStart=-1 -mig2gpu_gpu_temperature_lineNumberEnd=-1 +mig2gpu_gpu_utilization_lineNumberStart=-25.06.25.06.1-SNAPSHOT +mig2gpu_gpu_utilization_lineNumberEnd=-25.06.25.06.1-SNAPSHOT +mig2gpu_gpu_temperature_lineNumberStart=-25.06.25.06.1-SNAPSHOT +mig2gpu_gpu_temperature_lineNumberEnd=-25.06.25.06.1-SNAPSHOT # The function to replace a MIG-enabled GPU with the "fake" GPU device elements # corresponding to MIG devices contained within the given GPU element @@ -84,9 +84,9 @@ mig2gpu_gpu_temperature_lineNumberEnd=-1 # # # 495.29.05 -# +# # Quadro RTX 6000 -# GPU-903720f4-f8d1-11e0-3b2f-4bd740b2f424 +# GPU-903720f4-f8d25.06.25.06.1-SNAPSHOT-25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOTe0-3b2f-4bd740b2f424 # 0 # # 673 MiB @@ -98,7 +98,7 @@ mig2gpu_gpu_temperature_lineNumberEnd=-1 # # 38 C # 94 C -# 91 C +# 925.06.25.06.1-SNAPSHOT C # # # @@ -110,10 +110,10 @@ mig2gpu_gpu_temperature_lineNumberEnd=-1 # 0 # # -# 14 -# 1 +# 25.06.25.06.1-SNAPSHOT4 +# 25.06.25.06.1-SNAPSHOT # 0 -# 1 +# 25.06.25.06.1-SNAPSHOT # 0 # 0 # @@ -124,19 +124,19 @@ mig2gpu_gpu_temperature_lineNumberEnd=-1 # # # -# 6016 MiB +# 6025.06.25.06.1-SNAPSHOT6 MiB # 3 MiB -# 6012 MiB +# 6025.06.25.06.1-SNAPSHOT2 MiB # -# -# 8191 MiB +# +# 825.06.25.06.1-SNAPSHOT925.06.25.06.1-SNAPSHOT MiB # 0 MiB -# 8191 MiB -# +# 825.06.25.06.1-SNAPSHOT925.06.25.06.1-SNAPSHOT MiB +# # # # To satisfy the minimum parseable GPU element, we need to -# 1) add a element, parent's orginal text + MIG + index +# 25.06.25.06.1-SNAPSHOT) add a element, parent's orginal text + MIG + index # 2) add a element accoring to https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#cuda-gi # MIG-// # 3) add parent's 0 (don't care) @@ -152,7 +152,7 @@ function processParentGpuGlobals { local lineNumber # increment 0-based GPU iteration order index - mig2gpu_gpuIdx=$((mig2gpu_gpuIdx+1)) + mig2gpu_gpuIdx=$((mig2gpu_gpuIdx+25.06.25.06.1-SNAPSHOT)) for ((lineNumber=mig2gpu_gpu_lineNumberStart; lineNumber'*'') if [[ "$line" =~ 'Enabled' ]]; then - mig2gpu_migEnabled=1 + mig2gpu_migEnabled=25.06.25.06.1-SNAPSHOT else mig2gpu_migEnabled=0 fi @@ -169,13 +169,13 @@ function processParentGpuGlobals { $'\t'*''*) if [[ "$line" =~ $'\t\t'(.*)'' ]]; then - mig2gpu_productName="${BASH_REMATCH[1]}" + mig2gpu_productName="${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}" fi ;; $'\t'*''*) if [[ "$line" =~ $'\t\t'(.*)'' ]]; then - mig2gpu_gpuUuid="${BASH_REMATCH[1]}" + mig2gpu_gpuUuid="${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}" fi ;; @@ -188,7 +188,7 @@ function processParentGpuGlobals { ;; $'\t'*''*) - mig2gpu_gpu_utilization_lineNumberEnd=$((lineNumber+1)) + mig2gpu_gpu_utilization_lineNumberEnd=$((lineNumber+25.06.25.06.1-SNAPSHOT)) ;; $'\t'*''*) @@ -196,7 +196,7 @@ function processParentGpuGlobals { ;; $'\t'*''*) - mig2gpu_gpu_temperature_lineNumberEnd=$((lineNumber+1)) + mig2gpu_gpu_temperature_lineNumberEnd=$((lineNumber+25.06.25.06.1-SNAPSHOT)) ;; esac done @@ -225,15 +225,15 @@ function replaceParentGpuWithMigs { $'\t'*''*) if [[ "$line" =~ $'\t'*''(.*)'' ]]; then - mig2gpu_migIndex="${BASH_REMATCH[1]}" + mig2gpu_migIndex="${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}" fi ;; $'\t'*'_instance_id>'*) if [[ "$line" =~ $'\t'*''(.*)'' ]]; then - mig2gpu_migGpuInstanceId="${BASH_REMATCH[1]}" + mig2gpu_migGpuInstanceId="${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}" elif [[ "$line" =~ $'\t'*''(.*)'' ]]; then - mig2gpu_migComputeInstanceId="${BASH_REMATCH[1]}" + mig2gpu_migComputeInstanceId="${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}" fi ;; @@ -242,14 +242,14 @@ function replaceParentGpuWithMigs { ;; $'\t'*''*) - local fbMemoryUsage_lineNumberEnd=$((lineNumber+1)) + local fbMemoryUsage_lineNumberEnd=$((lineNumber+25.06.25.06.1-SNAPSHOT)) local fbMemryUsageLength=$((fbMemoryUsage_lineNumberEnd-fbMemoryUsage_lineNumberStart)) local fbMemoryUsage=("${mig2gpu_inputLines[@]:$fbMemoryUsage_lineNumberStart:fbMemryUsageLength}") local migFbMemoryUsage=("${fbMemoryUsage[@]//$'\t\t\t'/$'\t\t'}") ;; $'\t'*''*) - mig2gpu_mig_lineNumberEnd=$((lineNumber+1)) + mig2gpu_mig_lineNumberEnd=$((lineNumber+25.06.25.06.1-SNAPSHOT)) # mig2gpu_migGpu_out+=("${mig2gpu_inputLines[$mig2gpu_gpu_lineNumberStart]}") @@ -269,7 +269,7 @@ function replaceParentGpuWithMigs { mig2gpu_migGpu_out+=($'\t\t'"<_mig2gpu_device_id>$migDeviceId") # if using this with CGROUP workaround we need the minor number to be from nvidia-caps access - if [[ "$ENABLE_MIG_GPUS_FOR_CGROUPS" == 1 ]]; then + if [[ "$ENABLE_MIG_GPUS_FOR_CGROUPS" == 25.06.25.06.1-SNAPSHOT ]]; then mig_minor_dev_num=`cat /proc/driver/nvidia-caps/mig-minors | grep gpu$mig2gpu_gpuIdx/gi$mig2gpu_migGpuInstanceId/access | cut -d ' ' -f 2` mig2gpu_migGpu_out+=($'\t\t'"$mig_minor_dev_num") else @@ -285,7 +285,7 @@ function replaceParentGpuWithMigs { mig2gpu_migGpu_out+=("${mig2gpu_inputLines[@]:$mig2gpu_gpu_temperature_lineNumberStart:$gpuTemperatureLength}") # - mig2gpu_migGpu_out+=("${mig2gpu_inputLines[$((mig2gpu_gpu_lineNumberEnd-1))]}") + mig2gpu_migGpu_out+=("${mig2gpu_inputLines[$((mig2gpu_gpu_lineNumberEnd-25.06.25.06.1-SNAPSHOT))]}") ;; esac done @@ -295,7 +295,7 @@ function replaceParentGpuWithMigs { function processGpuElement { processParentGpuGlobals - if [[ "$mig2gpu_migEnabled" != "1" ]]; then + if [[ "$mig2gpu_migEnabled" != "25.06.25.06.1-SNAPSHOT" ]]; then addOriginalGpuIndexAsDeviceId else # scan gpu element lines twice because the mig section appears before @@ -329,7 +329,7 @@ function mig2gpuMain { $'\t"*) - current_gpu_idx=$(($current_gpu_idx+1)) + current_gpu_idx=$(($current_gpu_idx+25.06.25.06.1-SNAPSHOT)) if [[ "$deviceArgWithLeadingTrailingComma" =~ ",${current_gpu_idx}," && "$line" =~ '<_mig2gpu_device_id>'(.*)'' ]]; then - nvcli_migDeviceIds+=("${BASH_REMATCH[1]}") + nvcli_migDeviceIds+=("${BASH_REMATCH[25.06.25.06.1-SNAPSHOT]}") fi ;; diff --git a/examples/MIG-Support/yarn-unpatched/scripts/nvidia-smi b/examples/MIG-Support/yarn-unpatched/scripts/nvidia-smi index 854fecc9c..daba30e0a 100755 --- a/examples/MIG-Support/yarn-unpatched/scripts/nvidia-smi +++ b/examples/MIG-Support/yarn-unpatched/scripts/nvidia-smi @@ -1,6 +1,6 @@ #!/bin/bash -# Copyright (c) 2022, NVIDIA CORPORATION. +# Copyright (c) 2022-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,17 +34,17 @@ for arg in "$@"; do case "$arg" in "-q"|"--query") - QUERY_ARG=1 + QUERY_ARG=25.06.25.06.1-SNAPSHOT ;; "-x"|"--xml-format") - XML_FORMAT_ARG=1 + XML_FORMAT_ARG=25.06.25.06.1-SNAPSHOT ;; esac done -if [[ "$MIG_AS_GPU_ENABLED" == "1" && "$XML_FORMAT_ARG" == "1" && "$QUERY_ARG" == "1" ]]; then +if [[ "$MIG_AS_GPU_ENABLED" == "25.06.25.06.1-SNAPSHOT" && "$XML_FORMAT_ARG" == "25.06.25.06.1-SNAPSHOT" && "$QUERY_ARG" == "25.06.25.06.1-SNAPSHOT" ]]; then "$REAL_NVIDIA_SMI_PATH" "$@" | "$THIS_DIR/mig2gpu.sh" else "$REAL_NVIDIA_SMI_PATH" "$@" diff --git a/examples/ML+DL-Examples/Optuna-Spark/README.md b/examples/ML+DL-Examples/Optuna-Spark/README.md index 233f61c7a..9cbf2924a 100644 --- a/examples/ML+DL-Examples/Optuna-Spark/README.md +++ b/examples/ML+DL-Examples/Optuna-Spark/README.md @@ -1,4 +1,4 @@ - + # Distributed Hyperparameter Tuning @@ -8,11 +8,11 @@ These examples demonstrate distributed hyperparameter tuning with [Optuna](https - [Overview](#overview) - [Examples](#examples) - [Running Optuna on Spark Standalone](#running-optuna-on-spark-standalone) - - [Setup Database for Optuna](#1-setup-database-for-optuna) + - [Setup Database for Optuna](#25.06.25.06.1-SNAPSHOT-setup-database-for-optuna) - [Setup Optuna Python Environment](#2-setup-optuna-python-environment) - [Start Standalone Cluster and Run](#3-start-standalone-cluster-and-run) - [Running Optuna on Databricks](#running-optuna-on-databricks) - - [Upload Init Script and Notebook](#1-upload-init-script-and-notebook) + - [Upload Init Script and Notebook](#25.06.25.06.1-SNAPSHOT-upload-init-script-and-notebook) - [Create Cluster](#2-create-cluster) - [Run Notebook](#3-run-notebook) - [Benchmarks](#benchmarks) @@ -26,7 +26,7 @@ These examples demonstrate distributed hyperparameter tuning with [Optuna](https Optuna is a lightweight Python library for hyperparameter tuning, integrating state-of-the-art hyperparameter optimization algorithms. At a high level, we optimize hyperparameters in three steps: -1. Wrap model training with an `objective` function that returns a loss metric. +25.06.25.06.1-SNAPSHOT. Wrap model training with an `objective` function that returns a loss metric. 2. In each `trial`, suggest hyperparameters based on previous results. 3. Create a `study` object, which executes the optimization and stores the trial results. @@ -35,7 +35,7 @@ At a high level, we optimize hyperparameters in three steps: import xgboost as xgb import optuna -# 1. Define an objective function to be maximized. +# 25.06.25.06.1-SNAPSHOT. Define an objective function to be maximized. def objective(trial): ... @@ -43,10 +43,10 @@ def objective(trial): param = { "objective": "binary:logistic", "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]), - "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True), - "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True), - "subsample": trial.suggest_float("subsample", 0.2, 1.0), - "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 1.0), + "lambda": trial.suggest_float("lambda", 25.06.25.06.1-SNAPSHOTe-8, 25.06.25.06.1-SNAPSHOT.0, log=True), + "alpha": trial.suggest_float("alpha", 25.06.25.06.1-SNAPSHOTe-8, 25.06.25.06.1-SNAPSHOT.0, log=True), + "subsample": trial.suggest_float("subsample", 0.2, 25.06.25.06.1-SNAPSHOT.0), + "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 25.06.25.06.1-SNAPSHOT.0), } booster = xgb.train(param, dtrain) @@ -55,11 +55,11 @@ def objective(trial): # 3. Create a study object and optimize the objective function. study = optuna.create_study(direction='maximize') -study.optimize(objective, n_trials=100) +study.optimize(objective, n_trials=25.06.25.06.1-SNAPSHOT00) ``` To run **distributed tuning** on Spark, we take the following steps: -1. Each worker receives a copy of the same dataset. +25.06.25.06.1-SNAPSHOT. Each worker receives a copy of the same dataset. 2. Each worker runs a subset of the trials in parallel. 3. Workers write trial results and receive new hyperparameters using a shared database. @@ -78,7 +78,7 @@ We provide **2 notebooks**, with differences in the backend/implementation. See ## Running Optuna on Spark Standalone -### 1. Setup Database for Optuna +### 25.06.25.06.1-SNAPSHOT. Setup Database for Optuna Optuna offers an RDBStorage option which allows for the persistence of experiments across different machines and processes, thereby enabling Optuna tasks to be distributed. @@ -87,7 +87,7 @@ This section will walk you through setting up MySQL as the backend for RDBStorag We highly recommend installing MySQL on the driver node. This setup eliminates concerns regarding MySQL connectivity between worker nodes and the driver, simplifying the management of database connections. (For Databricks, the installation is handled by the init script). -1. Install MySql: +25.06.25.06.1-SNAPSHOT. Install MySql: ``` shell sudo apt install mysql-server @@ -116,13 +116,13 @@ sudo mysql ``` mysql mysql> CREATE USER 'optuna_user'@'%' IDENTIFIED BY 'optuna_password'; -Query OK, 0 rows affected (0.01 sec) +Query OK, 0 rows affected (0.025.06.25.06.1-SNAPSHOT sec) mysql> GRANT ALL PRIVILEGES ON *.* TO 'optuna_user'@'%' WITH GRANT OPTION; -Query OK, 0 rows affected (0.01 sec) +Query OK, 0 rows affected (0.025.06.25.06.1-SNAPSHOT sec) mysql> FLUSH PRIVILEGES; -Query OK, 0 rows affected (0.01 sec) +Query OK, 0 rows affected (0.025.06.25.06.1-SNAPSHOT sec) mysql> EXIT; Bye @@ -148,7 +148,7 @@ We use [RAPIDS](https://docs.rapids.ai/install/#get-rapids) for GPU-accelerated sudo apt install libmysqlclient-dev conda create -n rapids-25.04 -c rapidsai -c conda-forge -c nvidia \ - cudf=25.04 cuml=25.04 python=3.10 'cuda-version>=12.0,<=12.5' + cudf=25.04 cuml=25.04 python=3.25.06.25.06.1-SNAPSHOT0 'cuda-version>=25.06.25.06.1-SNAPSHOT2.0,<=25.06.25.06.1-SNAPSHOT2.5' conda activate optuna-spark pip install mysqlclient pip install optuna joblib joblibspark ipywidgets @@ -160,11 +160,11 @@ Configure your standalone cluster settings. This example just creates local cluster with a single GPU worker: ```shell export SPARK_HOME=/path/to/spark -export SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=1 \ +export SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=25.06.25.06.1-SNAPSHOT \ -Dspark.worker.resource.gpu.discoveryScript=$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh" -export MASTER=spark://$(hostname):7077; export SPARK_WORKER_INSTANCES=1; export CORES_PER_WORKER=8 +export MASTER=spark://$(hostname):7077; export SPARK_WORKER_INSTANCES=25.06.25.06.1-SNAPSHOT; export CORES_PER_WORKER=8 -${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-worker.sh -c ${CORES_PER_WORKER} -m 16G ${MASTER} +${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-worker.sh -c ${CORES_PER_WORKER} -m 25.06.25.06.1-SNAPSHOT6G ${MASTER} ``` You can now run the notebook using the `optuna-spark` Python kernel! @@ -173,7 +173,7 @@ The notebook contains instructions to attach to the standalone cluster. ## Running Optuna on Databricks -### 1. Upload Init Script and Notebook +### 25.06.25.06.1-SNAPSHOT. Upload Init Script and Notebook - Make sure your [Databricks CLI]((https://docs.databricks.com/en/dev-tools/cli/tutorial.html)) is configured for your Databricks workspace. - Copy the desired notebook into your Databricks workspace. For example: @@ -209,7 +209,7 @@ Locate the notebook in your workspace and click on `Connect` to attach it to the ## Benchmarks -The graph below shows running times comparing distributed (8 GPUs) vs. single GPU hyperparameter tuning with 100 trials on synthetic regression datasets. +The graph below shows running times comparing distributed (8 GPUs) vs. single GPU hyperparameter tuning with 25.06.25.06.1-SNAPSHOT00 trials on synthetic regression datasets. ![Databricks benchmarking results](images/runtimes.png) @@ -219,7 +219,7 @@ The Optuna tasks will be serialized into bytes and distributed to Spark workers During tuning, the Optuna tasks send intermediate results back to RDBStorage to persist, and ask for the parameters from RDBStorage sampled by Optuna on the driver to run next. -**Using JoblibSpark**: each Optuna task is a Spark application that has only 1 job, 1 stage, 1 task, and the Spark application will be submitted on the local threads. Here the parameter `n_jobs` configures the Spark backend to limit how many Spark applications are submitted at the same time. +**Using JoblibSpark**: each Optuna task is a Spark application that has only 25.06.25.06.1-SNAPSHOT job, 25.06.25.06.1-SNAPSHOT stage, 25.06.25.06.1-SNAPSHOT task, and the Spark application will be submitted on the local threads. Here the parameter `n_jobs` configures the Spark backend to limit how many Spark applications are submitted at the same time. Thus Optuna with JoblibSpark uses Spark application level parallelism, rather than task-level parallelism. For larger datasets, ensure that a single XGBoost task can run on a single node without any CPU/GPU OOM. @@ -240,4 +240,4 @@ Since each worker requires the full dataset to perform hyperparameter tuning, th - Please be aware that Optuna studies will continue where they left off from previous trials; delete and recreate the study if you would like to start anew. - Optuna in distributed mode is **non-deterministic** (see [this link](https://optuna.readthedocs.io/en/stable/faq.html#how-can-i-obtain-reproducible-optimization-results)), as trials are executed asynchronously by executors. Deterministic behavior can be achieved using Spark barriers to coordinate reads/writes to the database. - Reading data with GPU using cuDF requires disabling [GPUDirect Storage](https://docs.rapids.ai/api/cudf/nightly/user_guide/io/io/#magnum-io-gpudirect-storage-integration), i.e., setting the environment variable `LIBCUDF_CUFILE_POLICY=OFF`, to be compatible with the Databricks file system. Without GDS, cuDF will use a CPU bounce buffer when reading files, but all parsing and decoding will still be accelerated by the GPU. -- Note that the storage doesn’t store the state of the instance of samplers and pruners. To resume a study with a sampler whose seed argument is specified, [the sampler can be pickled](https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/001_rdb.html#resume-study) and returned to the driver alongside the results. +- Note that the storage doesn’t store the state of the instance of samplers and pruners. To resume a study with a sampler whose seed argument is specified, [the sampler can be pickled](https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/0025.06.25.06.1-SNAPSHOT_rdb.html#resume-study) and returned to the driver alongside the results. diff --git a/examples/ML+DL-Examples/Optuna-Spark/images/optuna.svg b/examples/ML+DL-Examples/Optuna-Spark/images/optuna.svg index 17103908c..cf329ca14 100644 --- a/examples/ML+DL-Examples/Optuna-Spark/images/optuna.svg +++ b/examples/ML+DL-Examples/Optuna-Spark/images/optuna.svg @@ -1,4 +1,4 @@ - + - -
Optuna Application
ThreadPool
Driver
Spark Application
Spark Application
Spark Application
Spark Application
Optuna task
Optuna task
Optuna task
Optuna task
Workers
Optuna
RDBStorage
backended by
MySql
\ No newline at end of file + +
Optuna Application
ThreadPool
Driver
Spark Application
Spark Application
Spark Application
Spark Application
Optuna task
Optuna task
Optuna task
Optuna task
Workers
Optuna
RDBStorage
backended by
MySql
\ No newline at end of file diff --git a/examples/ML+DL-Examples/Optuna-Spark/images/runtimes.png b/examples/ML+DL-Examples/Optuna-Spark/images/runtimes.png index 819568892..d1ffadde7 100644 Binary files a/examples/ML+DL-Examples/Optuna-Spark/images/runtimes.png and b/examples/ML+DL-Examples/Optuna-Spark/images/runtimes.png differ diff --git a/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/init_optuna.sh b/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/init_optuna.sh index 48ef76741..10c583206 100644 --- a/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/init_optuna.sh +++ b/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/init_optuna.sh @@ -20,7 +20,7 @@ if [[ $DB_IS_DRIVER = "TRUE" ]]; then if [[ ! -f "/etc/mysql/mysql.conf.d/mysqld.cnf" ]]; then echo "ERROR: MYSQL installation failed" - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi # configure mysql @@ -42,19 +42,19 @@ fi # rapids import SPARK_RAPIDS_VERSION=25.06.0 -curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar -o \ - /databricks/jars/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar +curl -L https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-${SPARK_RAPIDS_VERSION}.jar -o \ + /databricks/jars/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-${SPARK_RAPIDS_VERSION}.jar -# setup cuda: install cudatoolkit 11.8 via runfile approach -wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run -sh cuda_11.8.0_520.61.05_linux.run --silent --toolkit +# setup cuda: install cudatoolkit 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8 via runfile approach +wget https://developer.download.nvidia.com/compute/cuda/25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0/local_installers/cuda_25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0_520.625.06.25.06.1-SNAPSHOT.05_linux.run +sh cuda_25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0_520.625.06.25.06.1-SNAPSHOT.05_linux.run --silent --toolkit # reset symlink and update library loading paths rm /usr/local/cuda -ln -s /usr/local/cuda-11.8 /usr/local/cuda +ln -s /usr/local/cuda-25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8 /usr/local/cuda sudo /databricks/python3/bin/pip3 install \ --extra-index-url=https://pypi.nvidia.com \ - "cudf-cu11==25.02.*" "cuml-cu11==25.02.*" + "cudf-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT==25.02.*" "cuml-cu25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT==25.02.*" # setup python environment sudo apt clean && sudo apt update --fix-missing -y diff --git a/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/start_cluster.sh b/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/start_cluster.sh index de655df3d..7a860c94f 100755 --- a/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/start_cluster.sh +++ b/examples/ML+DL-Examples/Optuna-Spark/optuna-examples/databricks/start_cluster.sh @@ -3,35 +3,35 @@ if [[ -z ${INIT_PATH} ]]; then echo "Please export INIT_PATH per README.md" - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi json_config=$(cat < ```shell # Configure and start cluster export MASTER=spark://$(hostname):7077 -export SPARK_WORKER_INSTANCES=1 +export SPARK_WORKER_INSTANCES=25.06.25.06.1-SNAPSHOT export CORES_PER_WORKER=8 -export SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=1 -Dspark.worker.resource.gpu.discoveryScript=$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh" -${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-worker.sh -c ${CORES_PER_WORKER} -m 16G ${MASTER} +export SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=25.06.25.06.1-SNAPSHOT -Dspark.worker.resource.gpu.discoveryScript=$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh" +${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-worker.sh -c ${CORES_PER_WORKER} -m 25.06.25.06.1-SNAPSHOT6G ${MASTER} ``` The notebooks are ready to run! Each notebook has a cell to connect to the standalone cluster and create a SparkSession. @@ -102,7 +102,7 @@ The notebooks are ready to run! Each notebook has a cell to connect to the stand - Please create separate environments for PyTorch and Tensorflow notebooks as specified above. This will avoid conflicts between the CUDA libraries bundled with their respective versions. - `requirements.txt` installs pyspark>=3.4.0. Make sure the installed PySpark version is compatible with your system's Spark installation. - The notebooks require a GPU environment for the executors. -- The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but as of writing it is not supported in TF==2.17.0. +- The PyTorch notebooks include model compilation and accelerated inference with TensorRT. While not included in the notebooks, Tensorflow also supports [integration with TensorRT](https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html), but as of writing it is not supported in TF==2.25.06.25.06.1-SNAPSHOT7.0. - Note that some Huggingface models may be gated and will require a login, e.g.,: ```python from huggingface_hub import login @@ -129,7 +129,7 @@ See the instructions for [Databricks](databricks/README.md) and [GCP Dataproc](d The notebooks also demonstrate integration with the [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html), an open-source serving platform for deep learning models, which includes many [features and performance optimizations](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html#triton-major-features) to streamline inference. The notebooks use [PyTriton](https://github.com/triton-inference-server/pytriton), a Flask-like Python framework that handles communication with the Triton server. -drawing +drawing The diagram above shows how Spark distributes inference tasks to run on the Triton Inference Server, with PyTriton handling request/response communication with the server. diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md index cc7605225..946f94c3d 100644 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md @@ -4,7 +4,7 @@ ## Setup -1. Install the latest [databricks-cli](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) and configure for your workspace. +25.06.25.06.1-SNAPSHOT. Install the latest [databricks-cli](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) and configure for your workspace. 2. Specify the path to your Databricks workspace: ```shell @@ -34,7 +34,7 @@ databricks workspace import $INIT_DEST --format AUTO --file $INIT_SRC ``` -6. Launch the cluster with the provided script. By default the script will create a cluster with 4 A10 worker nodes and 1 A10 driver node. (Note that the script uses **Azure instances** by default; change as needed). +6. Launch the cluster with the provided script. By default the script will create a cluster with 4 A25.06.25.06.1-SNAPSHOT0 worker nodes and 25.06.25.06.1-SNAPSHOT A25.06.25.06.1-SNAPSHOT0 driver node. (Note that the script uses **Azure instances** by default; change as needed). ```shell cd setup chmod +x start_cluster.sh @@ -46,7 +46,7 @@ - Integration with Triton inference server uses stage-level scheduling (Spark>=3.4.0). Make sure to: - use a cluster with GPU resources (for LLM examples, make sure the selected GPUs have sufficient RAM) - set a value for `spark.executor.cores` - - ensure that `spark.executor.resource.gpu.amount` = 1 + - ensure that `spark.executor.resource.gpu.amount` = 25.06.25.06.1-SNAPSHOT - Under `Advanced Options > Init Scripts`, upload the init script from your workspace. - Under environment variables, set: - `FRAMEWORK=torch` or `FRAMEWORK=tf` based on the notebook used. diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh index 9515f4357..4288e8d08 100755 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh @@ -12,10 +12,10 @@ datasets==3.* transformers urllib3<2 nvidia-pytriton -torch<=2.5.1 -torchvision --extra-index-url https://download.pytorch.org/whl/cu121 +torch<=2.5.25.06.25.06.1-SNAPSHOT +torchvision --extra-index-url https://download.pytorch.org/whl/cu25.06.25.06.1-SNAPSHOT225.06.25.06.1-SNAPSHOT torch-tensorrt -tensorrt --extra-index-url https://download.pytorch.org/whl/cu121 +tensorrt --extra-index-url https://download.pytorch.org/whl/cu25.06.25.06.1-SNAPSHOT225.06.25.06.1-SNAPSHOT sentence_transformers sentencepiece nvidia-modelopt[all] --extra-index-url https://pypi.nvidia.com @@ -29,7 +29,7 @@ nvidia-pytriton EOF else echo "Please export FRAMEWORK as torch or tf per README" - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi sudo /databricks/python3/bin/pip3 install --upgrade --force-reinstall -r temp_requirements.txt diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh index 457b080bb..edf310e08 100755 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh @@ -6,30 +6,30 @@ set -eo pipefail # configure arguments if [[ -z ${INIT_DEST} ]]; then echo "Please make sure INIT_DEST is exported per README.md" - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi if [[ -z ${FRAMEWORK} ]]; then echo "Please make sure FRAMEWORK is exported to torch or tf per README.md" - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi # Modify the node_type_id and driver_node_type_id below if you don't have this specific instance type. -# Modify executor.cores=(cores per node) and task.resource.gpu.amount=(1/executor cores) accordingly. -# We recommend selecting A10/L4+ instances for these examples. +# Modify executor.cores=(cores per node) and task.resource.gpu.amount=(25.06.25.06.1-SNAPSHOT/executor cores) accordingly. +# We recommend selecting A25.06.25.06.1-SNAPSHOT0/L4+ instances for these examples. json_config=$(cat < List[int]: while len(ports) < 3: if i not in conns: ports.append(i) - i += 1 + i += 25.06.25.06.1-SNAPSHOT return ports @@ -66,7 +66,7 @@ def _find_ports(start_port: int = 7000) -> List[int]: ), "Server function must accept (ports, model_path) when model_path is provided" args = (ports, model_path) else: - assert len(params) == 1, "Server function must accept (ports) argument" + assert len(params) == 25.06.25.06.1-SNAPSHOT, "Server function must accept (ports) argument" args = (ports,) hostname = socket.gethostname() @@ -134,7 +134,7 @@ class TritonServerManager: >>> print(f"Server shutdown success: {success}") """ - DEFAULT_WAIT_RETRIES = 10 + DEFAULT_WAIT_RETRIES = 25.06.25.06.1-SNAPSHOT0 DEFAULT_WAIT_TIMEOUT = 5 def __init__( @@ -174,27 +174,27 @@ def host_to_grpc_url(self) -> Dict[str, str]: return None return { - host: f"grpc://{host}:{ports[1]}" + host: f"grpc://{host}:{ports[25.06.25.06.1-SNAPSHOT]}" for host, (_, ports) in self._server_pids_ports.items() } def _get_node_rdd(self) -> RDD: - """Create and configure RDD with stage-level scheduling for 1 task per node.""" + """Create and configure RDD with stage-level scheduling for 25.06.25.06.1-SNAPSHOT task per node.""" sc = self.spark.sparkContext node_rdd = sc.parallelize(list(range(self.num_nodes)), self.num_nodes) return self._use_stage_level_scheduling(node_rdd) - def _use_stage_level_scheduling(self, rdd: RDD, task_gpus: float = 1.0) -> RDD: + def _use_stage_level_scheduling(self, rdd: RDD, task_gpus: float = 25.06.25.06.1-SNAPSHOT.0) -> RDD: """ - Use stage-level scheduling to ensure each Triton server instance maps to 1 GPU (executor). + Use stage-level scheduling to ensure each Triton server instance maps to 25.06.25.06.1-SNAPSHOT GPU (executor). From https://github.com/NVIDIA/spark-rapids-ml/blob/main/python/src/spark_rapids_ml/core.py """ executor_cores = self.spark.conf.get("spark.executor.cores") assert executor_cores is not None, "spark.executor.cores is not set" executor_gpus = self.spark.conf.get("spark.executor.resource.gpu.amount") assert ( - executor_gpus is not None and int(executor_gpus) == 1 - ), "spark.executor.resource.gpu.amount must be set and = 1" + executor_gpus is not None and int(executor_gpus) == 25.06.25.06.1-SNAPSHOT + ), "spark.executor.resource.gpu.amount must be set and = 25.06.25.06.1-SNAPSHOT" from pyspark.resource.profile import ResourceProfileBuilder from pyspark.resource.requests import TaskResourceRequests @@ -210,7 +210,7 @@ def _use_stage_level_scheduling(self, rdd: RDD, task_gpus: float = 1.0) -> RDD: int(executor_cores) if "com.nvidia.spark.SQLPlugin" in spark_plugins and "true" == spark_rapids_sql_enabled.lower() - else (int(executor_cores) // 2) + 1 + else (int(executor_cores) // 2) + 25.06.25.06.1-SNAPSHOT ) treqs = TaskResourceRequests().cpus(task_cores).resource("gpu", task_gpus) rp = ResourceProfileBuilder().require(treqs).build diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/torch_requirements.txt b/examples/ML+DL-Examples/Spark-DL/dl_inference/torch_requirements.txt index 708d13870..325b88032 100644 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/torch_requirements.txt +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/torch_requirements.txt @@ -13,10 +13,10 @@ # limitations under the License. -r requirements.txt -torch<=2.5.1 +torch<=2.5.25.06.25.06.1-SNAPSHOT torchvision torch-tensorrt -tensorrt --extra-index-url https://download.pytorch.org/whl/cu121 +tensorrt --extra-index-url https://download.pytorch.org/whl/cu25.06.25.06.1-SNAPSHOT225.06.25.06.1-SNAPSHOT sentence_transformers sentencepiece nvidia-modelopt[all] --extra-index-url https://pypi.nvidia.com \ No newline at end of file diff --git a/examples/ML+DL-Examples/Spark-Rapids-ML/pca/README.md b/examples/ML+DL-Examples/Spark-Rapids-ML/pca/README.md index 309ef2a81..96c810000 100644 --- a/examples/ML+DL-Examples/Spark-Rapids-ML/pca/README.md +++ b/examples/ML+DL-Examples/Spark-Rapids-ML/pca/README.md @@ -9,8 +9,8 @@ Please refer to the Spark-Rapids-ML [README](https://github.com/NVIDIA/spark-rap ## Download RAPIDS Jar from Maven Central -Download the [Spark-Rapids plugin](https://nvidia.github.io/spark-rapids/docs/download.html#download-rapids-accelerator-for-apache-spark-v24081). -For Spark-RAPIDS-ML version 25.06.0, download the RAPIDS jar from Maven Central: [rapids-4-spark_2.12-25.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar). +Download the [Spark-Rapids plugin](https://nvidia.github.io/spark-rapids/docs/download.html#download-rapids-accelerator-for-apache-spark-v240825.06.25.06.1-SNAPSHOT). +For Spark-RAPIDS-ML version 25.06.0, download the RAPIDS jar from Maven Central: [rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar](https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar). ## Running the Notebooks diff --git a/examples/ML+DL-Examples/Spark-Rapids-ML/pca/start-spark-rapids.sh b/examples/ML+DL-Examples/Spark-Rapids-ML/pca/start-spark-rapids.sh index 4e292bb4e..23c680f46 100755 --- a/examples/ML+DL-Examples/Spark-Rapids-ML/pca/start-spark-rapids.sh +++ b/examples/ML+DL-Examples/Spark-Rapids-ML/pca/start-spark-rapids.sh @@ -1,6 +1,6 @@ #!/bin/bash # -# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,25 +18,25 @@ # Check if SPARK_HOME is set if [ -z "$SPARK_HOME" ]; then echo "Please set the SPARK_HOME environment variable before running this script." - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi # Check if RAPIDS_JAR is set if [ -z "$RAPIDS_JAR" ]; then echo "Please set the RAPIDS_JAR environment variable before running this script." - exit 1 + exit 25.06.25.06.1-SNAPSHOT fi # Configuration MASTER_HOSTNAME=$(hostname) MASTER=spark://${MASTER_HOSTNAME}:7077 CORES_PER_WORKER=8 -MEMORY_PER_WORKER=16G +MEMORY_PER_WORKER=25.06.25.06.1-SNAPSHOT6G # Environment variables export SPARK_HOME=${SPARK_HOME} export MASTER=${MASTER} -export SPARK_WORKER_INSTANCES=1 +export SPARK_WORKER_INSTANCES=25.06.25.06.1-SNAPSHOT export CORES_PER_WORKER=${CORES_PER_WORKER} export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='lab' @@ -49,10 +49,10 @@ ${SPARK_HOME}/sbin/start-worker.sh -c ${CORES_PER_WORKER} -m ${MEMORY_PER_WORKER # Start Jupyter with PySpark echo "Launching PySpark with Jupyter..." ${SPARK_HOME}/bin/pyspark --master ${MASTER} \ ---driver-memory 10G \ +--driver-memory 25.06.25.06.1-SNAPSHOT0G \ --executor-memory 8G \ ---conf spark.task.maxFailures=1 \ ---conf spark.rpc.message.maxSize=1024 \ +--conf spark.task.maxFailures=25.06.25.06.1-SNAPSHOT \ +--conf spark.rpc.message.maxSize=25.06.25.06.1-SNAPSHOT024 \ --conf spark.sql.pyspark.jvmStacktrace.enabled=true \ --conf spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled=false \ --conf spark.sql.execution.arrow.pyspark.enabled=true \ @@ -60,7 +60,7 @@ ${SPARK_HOME}/bin/pyspark --master ${MASTER} \ --conf spark.rapids.ml.uvm.enabled=true \ --conf spark.jars=${RAPIDS_JAR} \ --conf spark.executorEnv.PYTHONPATH=${RAPIDS_JAR} \ ---conf spark.rapids.memory.gpu.minAllocFraction=0.0001 \ +--conf spark.rapids.memory.gpu.minAllocFraction=0.00025.06.25.06.1-SNAPSHOT \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.locality.wait=0s \ --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer \ @@ -72,9 +72,9 @@ ${SPARK_HOME}/bin/pyspark --master ${MASTER} \ --conf spark.rapids.sql.python.gpu.enabled=true \ --conf spark.rapids.memory.pinnedPool.size=2G \ --conf spark.python.daemon.module=rapids.daemon \ ---conf spark.rapids.sql.batchSizeBytes=512m \ +--conf spark.rapids.sql.batchSizeBytes=525.06.25.06.1-SNAPSHOT2m \ --conf spark.sql.adaptive.enabled=false \ ---conf spark.sql.files.maxPartitionBytes=512m \ ---conf spark.rapids.sql.concurrentGpuTasks=1 \ +--conf spark.sql.files.maxPartitionBytes=525.06.25.06.1-SNAPSHOT2m \ +--conf spark.rapids.sql.concurrentGpuTasks=25.06.25.06.1-SNAPSHOT \ --conf spark.sql.execution.arrow.maxRecordsPerBatch=20000 \ --conf spark.rapids.sql.explain=NONE \ No newline at end of file diff --git a/examples/SQL+DF-Examples/customer-churn/README.md b/examples/SQL+DF-Examples/customer-churn/README.md index 0b9599e64..0682bbce5 100644 --- a/examples/SQL+DF-Examples/customer-churn/README.md +++ b/examples/SQL+DF-Examples/customer-churn/README.md @@ -3,7 +3,7 @@ This demo is derived from [data-science-blueprints](https://github.com/NVIDIA/data-science-blueprints) repository. The repository shows a realistic ETL workflow based on synthetic normalized data. It consists of two pieces: -1. _an augmentation notebook_, which synthesizes normalized (long-form) data from a wide-form input file, +25.06.25.06.1-SNAPSHOT. _an augmentation notebook_, which synthesizes normalized (long-form) data from a wide-form input file, optionally augmenting it by duplicating records, and 2. _an ETL notebook_, which performs joins and aggregations in order to generate wide-form data from the synthetic long-form data. diff --git a/examples/SQL+DF-Examples/customer-churn/notebooks/python/README.md b/examples/SQL+DF-Examples/customer-churn/notebooks/python/README.md index 445b087fc..17106fcf9 100644 --- a/examples/SQL+DF-Examples/customer-churn/notebooks/python/README.md +++ b/examples/SQL+DF-Examples/customer-churn/notebooks/python/README.md @@ -2,7 +2,7 @@ This demo shows a realistic ETL workflow based on synthetic normalized data. It consists of two pieces: -1. _an [augmentation notebook](augment.ipynb)_, which synthesizes normalized (long-form) data from a wide-form input file, +25.06.25.06.1-SNAPSHOT. _an [augmentation notebook](augment.ipynb)_, which synthesizes normalized (long-form) data from a wide-form input file, optionally augmenting it by duplicating records, and 2. _an [ETL notebook](etl.ipynb)_, which performs joins and aggregations in order to generate wide-form data from the synthetic long-form data. diff --git a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/augment.py b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/augment.py index 9babce12d..e765edc0c 100644 --- a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/augment.py +++ b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/augment.py @@ -60,7 +60,7 @@ def _get_uniques(ct): if ("uniques_%d" % ct) in table_names: return session.table("uniques_%d" % ct) else: - def str_part(seed=0x5CA1AB1E): + def str_part(seed=0x5CA25.06.25.06.1-SNAPSHOTAB25.06.25.06.1-SNAPSHOTE): "generate the string part of a unique ID" import random @@ -77,7 +77,7 @@ def str_part(seed=0x5CA1AB1E): uniques = ( session.createDataFrame( schema=StructType([StructField("u_value", StringType())]), - data=[dict(u_value=next(sp)) for _ in range(min(int(ct * 1.02), ct + 2))], + data=[dict(u_value=next(sp)) for _ in range(min(int(ct * 25.06.25.06.1-SNAPSHOT.02), ct + 2))], ) .distinct() .orderBy("u_value") @@ -149,7 +149,7 @@ def load_supplied_data(session, input_file): def replicate_df(df, duplicates): - if duplicates > 1: + if duplicates > 25.06.25.06.1-SNAPSHOT: uniques = _get_uniques(duplicates) df = ( @@ -193,12 +193,12 @@ def billing_events(df): def get_last_month(col): h = F.abs(F.xxhash64(col)) - h1 = (h.bitwiseAND(0xff)) % (MAX_MONTH // 2) + h25.06.25.06.1-SNAPSHOT = (h.bitwiseAND(0xff)) % (MAX_MONTH // 2) h2 = (F.shiftRight(h, 8).bitwiseAND(0xff)) % (MAX_MONTH // 3) - h3 = (F.shiftRight(h, 16).bitwiseAND(0xff)) % (MAX_MONTH // 5) + h3 = (F.shiftRight(h, 25.06.25.06.1-SNAPSHOT6).bitwiseAND(0xff)) % (MAX_MONTH // 5) h4 = (F.shiftRight(h, 24).bitwiseAND(0xff)) % (MAX_MONTH // 7) - h5 = (F.shiftRight(h, 32).bitwiseAND(0xff)) % (MAX_MONTH // 11) - return -(h1 + h2 + h3 + h4 + h5) + h5 = (F.shiftRight(h, 32).bitwiseAND(0xff)) % (MAX_MONTH // 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT) + return -(h25.06.25.06.1-SNAPSHOT + h2 + h3 + h4 + h5) w = pyspark.sql.Window.orderBy(F.lit("")).partitionBy(df.customerID) @@ -223,7 +223,7 @@ def get_last_month(col): F.lit("AccountCreation").alias("kind"), F.lit(0.0).cast(get_currency_type()).alias("value"), F.lit(now).alias("now"), - (-df.tenure - 1 + F.col("last_month")).alias("month_number"), + (-df.tenure - 25.06.25.06.1-SNAPSHOT + F.col("last_month")).alias("month_number"), ) .withColumn("date", F.expr("add_months(now, month_number)")) .drop("now", "month_number") @@ -252,7 +252,7 @@ def resolve_path(name): return name def write_df(df, name, skip_replication=False, partition_by=None): - dup_times = options["dup_times"] or 1 + dup_times = options["dup_times"] or 25.06.25.06.1-SNAPSHOT output_prefix = options["output_prefix"] or "" output_mode = options["output_mode"] or "overwrite" output_kind = options["output_kind"] or "parquet" @@ -274,11 +274,11 @@ def write_df(df, name, skip_replication=False, partition_by=None): def customer_meta(df): SENIOR_CUTOFF = 65 - ADULT_CUTOFF = 18 + ADULT_CUTOFF = 25.06.25.06.1-SNAPSHOT8 DAYS_IN_YEAR = 365.25 EXPONENTIAL_DIST_SCALE = 6.3 - augmented_original = replicate_df(df, options["dup_times"] or 1) + augmented_original = replicate_df(df, options["dup_times"] or 25.06.25.06.1-SNAPSHOT) customerMetaRaw = augmented_original.select( "customerID", @@ -298,14 +298,14 @@ def customer_meta(df): customerMetaRaw.SeniorCitizen == 0, ( customerMetaRaw.choice - * ((SENIOR_CUTOFF - ADULT_CUTOFF - 1) * DAYS_IN_YEAR) + * ((SENIOR_CUTOFF - ADULT_CUTOFF - 25.06.25.06.1-SNAPSHOT) * DAYS_IN_YEAR) ) + (ADULT_CUTOFF * DAYS_IN_YEAR), ).otherwise( (SENIOR_CUTOFF * DAYS_IN_YEAR) + ( DAYS_IN_YEAR - * (-F.log1p(-customerMetaRaw.choice) * EXPONENTIAL_DIST_SCALE) + * (-F.log25.06.25.06.1-SNAPSHOTp(-customerMetaRaw.choice) * EXPONENTIAL_DIST_SCALE) ) ) ).cast("int"), @@ -396,7 +396,7 @@ def debug_augmentation(df): .distinct() .select( "customerID", - F.substring("customerID", 0, 10).alias("originalID"), - F.element_at(F.split("customerID", "-", -1), 3).alias("suffix"), + F.substring("customerID", 0, 25.06.25.06.1-SNAPSHOT0).alias("originalID"), + F.element_at(F.split("customerID", "-", -25.06.25.06.1-SNAPSHOT), 3).alias("suffix"), ) ) \ No newline at end of file diff --git a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/eda.py b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/eda.py index 91a63e4a6..3b9a31f5f 100644 --- a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/eda.py +++ b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/eda.py @@ -39,12 +39,12 @@ def cardinalities(df, cols): def likely_unique(counts): total = counts["total"] - return [k for (k, v) in counts.items() if k != "total" and abs(total - v) < total * 0.15] + return [k for (k, v) in counts.items() if k != "total" and abs(total - v) < total * 0.25.06.25.06.1-SNAPSHOT5] def likely_categoricals(counts): total = counts["total"] - return [k for (k, v) in counts.items() if v < total * 0.15 or v < 128] + return [k for (k, v) in counts.items() if v < total * 0.25.06.25.06.1-SNAPSHOT5 or v < 25.06.25.06.1-SNAPSHOT28] def unique_values(df, cols): if eda_options['use_array_ops']: @@ -63,7 +63,7 @@ def unique_values_array(df, cols): result = reduce(lambda l, r: l.unionAll(r), [counts.select(F.lit(c).alias("field"), F.col(c).alias("unique_vals")) for c in counts.columns]).collect() - return dict([(r[0],r[1]) for r in result]) + return dict([(r[0],r[25.06.25.06.1-SNAPSHOT]) for r in result]) def unique_values_driver(df, cols): @@ -72,9 +72,9 @@ def unique_values_driver(df, cols): def approx_ecdf(df, cols): from functools import reduce - quantiles = [0.0, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 1.0] + quantiles = [0.0, 0.025.06.25.06.1-SNAPSHOT, 0.05, 0.25.06.25.06.1-SNAPSHOT, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 25.06.25.06.1-SNAPSHOT.0] - qs = df.approxQuantile(cols, quantiles, 0.01) + qs = df.approxQuantile(cols, quantiles, 0.025.06.25.06.1-SNAPSHOT) result = dict(zip(cols, qs)) return {c: dict(zip(quantiles, vs)) for (c, vs) in result.items()} @@ -102,9 +102,9 @@ def gen_summary(df, output_prefix=""): uniques = likely_unique(counts) categoricals = unique_values(df, likely_categoricals(counts)) - for span in [2,3,4,6,12]: - thecube = df.cube("Churn", F.ceil(df.tenure / span).alias("%d_month_spans" % span), "gender", "Partner", "SeniorCitizen", "Contract", "PaperlessBilling", "PaymentMethod", F.ceil(F.log2(F.col("MonthlyCharges"))*10).alias("log_charges")).count() - therollup = df.rollup("Churn", F.ceil(df.tenure / span).alias("%d_month_spans" % span), "SeniorCitizen", "Contract", "PaperlessBilling", "PaymentMethod", F.ceil(F.log2(F.col("MonthlyCharges"))*10).alias("log_charges")).agg(F.sum(F.col("TotalCharges")).alias("sum_charges")) + for span in [2,3,4,6,25.06.25.06.1-SNAPSHOT2]: + thecube = df.cube("Churn", F.ceil(df.tenure / span).alias("%d_month_spans" % span), "gender", "Partner", "SeniorCitizen", "Contract", "PaperlessBilling", "PaymentMethod", F.ceil(F.log2(F.col("MonthlyCharges"))*25.06.25.06.1-SNAPSHOT0).alias("log_charges")).count() + therollup = df.rollup("Churn", F.ceil(df.tenure / span).alias("%d_month_spans" % span), "SeniorCitizen", "Contract", "PaperlessBilling", "PaymentMethod", F.ceil(F.log2(F.col("MonthlyCharges"))*25.06.25.06.1-SNAPSHOT0).alias("log_charges")).agg(F.sum(F.col("TotalCharges")).alias("sum_charges")) thecube.write.mode("overwrite").parquet("%scube-%d.parquet" % (output_prefix, span)) therollup.write.mode("overwrite").parquet("%srollup-%d.parquet" % (output_prefix, span)) diff --git a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/etl.py b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/etl.py index 44e3986c4..ceb3a64b4 100644 --- a/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/etl.py +++ b/examples/SQL+DF-Examples/customer-churn/notebooks/python/churn/etl.py @@ -273,7 +273,7 @@ def is_senior_citizen(nowcol, dobcol): if options['use_calendar_arithmetic']: return F.when( F.col("now") >= F.add_months( - F.col("dateOfBirth"), 65 * 12 + F.col("dateOfBirth"), 65 * 25.06.25.06.1-SNAPSHOT2 ), F.lit(True) ).otherwise(F.lit(False)) else: diff --git a/examples/SQL+DF-Examples/micro-benchmarks/README.md b/examples/SQL+DF-Examples/micro-benchmarks/README.md index e135ff239..fb4f5685d 100644 --- a/examples/SQL+DF-Examples/micro-benchmarks/README.md +++ b/examples/SQL+DF-Examples/micro-benchmarks/README.md @@ -13,11 +13,11 @@ The microbenchmark notebook in this repo uses five such queries in the chart sho - **Cross-join**: A common use for a cross join is to obtain all combinations of items. - **Hash-join**: Joining two tables together by matching rows based on a common column. -These queries were run on a standard eight-nodes CPU cluster with 2 CPU (128 cores), -512GB memory and 1xA100 GPUs per node. The dataset used was of size 3TB with multiple different data types. +These queries were run on a standard eight-nodes CPU cluster with 2 CPU (25.06.25.06.1-SNAPSHOT28 cores), +525.06.25.06.1-SNAPSHOT2GB memory and 25.06.25.06.1-SNAPSHOTxA25.06.25.06.1-SNAPSHOT00 GPUs per node. The dataset used was of size 3TB with multiple different data types. The queries are based on several tables in NDS parquet format with Decimal. These four queries show not only performance and cost benefits but also the range of -speed-up (27x to 1.5x) varies depending on compute intensity. +speed-up (27x to 25.06.25.06.1-SNAPSHOT.5x) varies depending on compute intensity. These queries vary in compute and network utilization similar to a practical use case in data preprocessing.To test these queries, you can generate the parquet format dataset using this NDS dataset generator tool. All the queries are running on the SF3000(Scale Factor 3000) dataset. diff --git a/examples/SQL+DF-Examples/retail-analytics/README.md b/examples/SQL+DF-Examples/retail-analytics/README.md index d65dd2931..8f40fe266 100644 --- a/examples/SQL+DF-Examples/retail-analytics/README.md +++ b/examples/SQL+DF-Examples/retail-analytics/README.md @@ -2,6 +2,6 @@ # Overview Retail Analytics This repository contains two Jupyter notebooks: -Data Generation: This notebook generates sample data that can be used for analysis. It demonstrates how to use various Python libraries to create synthetic data sets that can be used for testing and experimentation. This notebook can be run in GCP n1-standard-32 instance type +Data Generation: This notebook generates sample data that can be used for analysis. It demonstrates how to use various Python libraries to create synthetic data sets that can be used for testing and experimentation. This notebook can be run in GCP n25.06.25.06.1-SNAPSHOT-standard-32 instance type Data Cleaning and Analysis: This notebook takes the generated data and performs a series of cleaning and analysis tasks. It demonstrates how to use Spark RAPIDS library to manipulate and analyze data sets. diff --git a/examples/SQL+DF-Examples/tpcds/README.md b/examples/SQL+DF-Examples/tpcds/README.md index 714417cdc..ff6d8a4d5 100644 --- a/examples/SQL+DF-Examples/tpcds/README.md +++ b/examples/SQL+DF-Examples/tpcds/README.md @@ -1,10 +1,10 @@ -# TPC-DS Scale Factor 10 (GiB) - CPU Spark vs GPU Spark +# TPC-DS Scale Factor 25.06.25.06.1-SNAPSHOT0 (GiB) - CPU Spark vs GPU Spark [TPC-DS](https://www.tpc.org/tpcds/) is a decision support benchmark often used to evaluate performance of OLAP Databases and Big Data systems. The notebook in this folder runs a user-specified subset of the TPC-DS queries on the -Scale Factor 10 (GiB) dataset. It uses [TPCDS PySpark](https://github.com/cerndb/SparkTraining/blob/master/notebooks/TPCDS_PySpark_CERN_SWAN_getstarted.ipynb) +Scale Factor 25.06.25.06.1-SNAPSHOT0 (GiB) dataset. It uses [TPCDS PySpark](https://github.com/cerndb/SparkTraining/blob/master/notebooks/TPCDS_PySpark_CERN_SWAN_getstarted.ipynb) to execute TPC-DS queries with SparkSQL on GPU and CPU capturing the metrics as a Pandas dataframe. It then plots a comparison bar chart visualizing the GPU acceleration achieved for the queries run with RAPIDS Spark in this @@ -18,7 +18,7 @@ This notebook can be opened and executed using standard It can also be opened and evaluated on hosted Notebook environments. Use the link below to launch on Google Colab and connect it to a [GPU instance](https://research.google.com/colaboratory/faq.html). - + Open In Colab diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile index 6a7742317..8f5bc2681 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile @@ -1,5 +1,5 @@ # -# Copyright (c) 2021-2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 20225.06.25.06.1-SNAPSHOT-2025, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,21 +15,21 @@ # # A container that can be used to build UDF native code against libcudf -ARG CUDA_VERSION=11.8.0 +ARG CUDA_VERSION=25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT.8.0 ARG LINUX_VERSION=rockylinux8 FROM nvidia/cuda:${CUDA_VERSION}-devel-${LINUX_VERSION} -ARG TOOLSET_VERSION=11 -ENV TOOLSET_VERSION=11 -ARG PARALLEL_LEVEL=10 -ENV PARALLEL_LEVEL=10 +ARG TOOLSET_VERSION=25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT +ENV TOOLSET_VERSION=25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT +ARG PARALLEL_LEVEL=25.06.25.06.1-SNAPSHOT0 +ENV PARALLEL_LEVEL=25.06.25.06.1-SNAPSHOT0 ### Install basic requirements RUN dnf --enablerepo=powertools install -y \ gcc-toolset-${TOOLSET_VERSION} \ git \ - java-1.8.0-openjdk \ + java-25.06.25.06.1-SNAPSHOT.8.0-openjdk \ maven \ ninja-build \ patch \ diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md index 427db7078..c51e7e5c5 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md @@ -155,7 +155,7 @@ unset CMAKE_CUDA_COMPILER_LAUNCHER unset CMAKE_CXX_LINKER_LAUNCHER ``` -The first build could take a long time (e.g.: 1.5 hours). Then the rapids-4-spark-udf-examples*.jar is +The first build could take a long time (e.g.: 25.06.25.06.1-SNAPSHOT.5 hours). Then the rapids-4-spark-udf-examples*.jar is generated under RAPIDS-accelerated-UDFs/target directory. The following build can benefit from ccache if you enable it. @@ -186,7 +186,7 @@ then do the following inside the Docker container. ### Get jars from Maven Central -[rapids-4-spark_2.12-25.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/25.06.0/rapids-4-spark_2.12-25.06.0.jar) +[rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar](https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/com/nvidia/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2/25.06.0/rapids-4-spark_2.25.06.25.06.1-SNAPSHOT2-25.06.0.jar) ### Launch a local mode Spark @@ -211,11 +211,11 @@ Input the following commands to test wordcount JNI UDF ```python from pyspark.sql.types import * schema = StructType([ - StructField("c1", StringType()), + StructField("c25.06.25.06.1-SNAPSHOT", StringType()), StructField("c2", IntegerType()), ]) data = [ - ("a b c d",1), + ("a b c d",25.06.25.06.1-SNAPSHOT), ("",2), (None,3), ("the quick brown fox jumped over the lazy dog",3), @@ -226,6 +226,6 @@ df = spark.createDataFrame( df.createOrReplaceTempView("tab") spark.sql("CREATE TEMPORARY FUNCTION {} AS '{}'".format("wordcount", "com.nvidia.spark.rapids.udf.hive.StringWordCount")) -spark.sql("select c1, wordcount(c1) from tab").show() -spark.sql("select c1, wordcount(c1) from tab").explain() +spark.sql("select c25.06.25.06.1-SNAPSHOT, wordcount(c25.06.25.06.1-SNAPSHOT) from tab").show() +spark.sql("select c25.06.25.06.1-SNAPSHOT, wordcount(c25.06.25.06.1-SNAPSHOT) from tab").explain() ``` diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml index ba1c88b15..4f46df4cc 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml @@ -1,4 +1,4 @@ - + 4.0.0 com.nvidia - rapids-4-spark-udf-examples_2.12 + rapids-4-spark-udf-examples_2.25.06.1-SNAPSHOT2 RAPIDS Accelerator for Apache Spark UDF Examples Sample implementations of RAPIDS accelerated user defined functions for use with the RAPIDS Accelerator @@ -28,24 +28,24 @@ 25.06.0-SNAPSHOT - 1.8 - 1.8 + 25.06.1-SNAPSHOT.8 + 25.06.1-SNAPSHOT.8 8 UTF-8 UTF-8 UTF-8 - cuda11 - 2.12 + cuda25.06.1-SNAPSHOT25.06.1-SNAPSHOT + 2.25.06.1-SNAPSHOT2 25.04.0 - 3.1.1 - 2.12.15 + 3.25.06.1-SNAPSHOT.25.06.1-SNAPSHOT + 2.25.06.1-SNAPSHOT2.25.06.1-SNAPSHOT5 ${project.build.directory}/cpp-build OFF RAPIDS ON - 10 + 25.06.1-SNAPSHOT0 OFF @@ -91,14 +91,14 @@ - net.alchim31.maven + net.alchim325.06.1-SNAPSHOT.maven scala-maven-plugin 4.3.0 org.apache.rat apache-rat-plugin - 0.13 + 0.25.06.1-SNAPSHOT3 org.apache.maven.plugins diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/run_pyspark_from_build.sh b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/run_pyspark_from_build.sh index 14512619b..e0913f79d 100755 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/run_pyspark_from_build.sh +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/run_pyspark_from_build.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Copyright (c) 2022, NVIDIA CORPORATION. +# Copyright (c) 2022-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,7 +14,7 @@ # limitations under the License. set -ex -SCRIPTPATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )" +SCRIPTPATH="$( cd "$(dirname "$0")" >/dev/null 2>&25.06.25.06.1-SNAPSHOT ; pwd -P )" cd "$SCRIPTPATH" if [[ $( echo ${SKIP_TESTS} | tr [:upper:] [:lower:] ) == "true" ]]; @@ -24,17 +24,17 @@ then elif [[ -z "$SPARK_HOME" ]]; then >&2 echo "SPARK_HOME IS NOT SET CANNOT RUN PYTHON INTEGRATION TESTS..." - exit 1 + exit 25.06.25.06.1-SNAPSHOT else echo "WILL RUN TESTS WITH SPARK_HOME: ${SPARK_HOME}" - # Spark 3.1.1 includes https://github.com/apache/spark/pull/31540 + # Spark 3.25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT includes https://github.com/apache/spark/pull/325.06.25.06.1-SNAPSHOT540 # which helps with spurious task failures as observed in our tests. If you are running - # Spark versions before 3.1.1, this sets the spark.max.taskFailures to 4 to allow for - # more lineant configuration, else it will set them to 1 as spurious task failures are not expected - # for Spark 3.1.1+ - VERSION_STRING=`$SPARK_HOME/bin/pyspark --version 2>&1|grep -v Scala|awk '/version\ [0-9.]+/{print $NF}'` + # Spark versions before 3.25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT, this sets the spark.max.taskFailures to 4 to allow for + # more lineant configuration, else it will set them to 25.06.25.06.1-SNAPSHOT as spurious task failures are not expected + # for Spark 3.25.06.25.06.1-SNAPSHOT.25.06.25.06.1-SNAPSHOT+ + VERSION_STRING=`$SPARK_HOME/bin/pyspark --version 2>&25.06.25.06.1-SNAPSHOT|grep -v Scala|awk '/version\ [0-9.]+/{print $NF}'` VERSION_STRING="${VERSION_STRING/-SNAPSHOT/}" - [[ -z $VERSION_STRING ]] && { echo "Unable to detect the Spark version at $SPARK_HOME"; exit 1; } + [[ -z $VERSION_STRING ]] && { echo "Unable to detect the Spark version at $SPARK_HOME"; exit 25.06.25.06.1-SNAPSHOT; } [[ -z $SPARK_SHIM_VER ]] && { SPARK_SHIM_VER="spark${VERSION_STRING//./}"; } echo "Detected Spark version $VERSION_STRING (shim version: $SPARK_SHIM_VER)" @@ -58,7 +58,7 @@ else "$@") "$SPARK_HOME"/bin/spark-submit --jars "${ALL_JARS// /,}" \ - --master local[1] \ + --master local[25.06.25.06.1-SNAPSHOT] \ "${RUN_TESTS_COMMAND[@]}" "${TEST_COMMON_OPTS[@]}" fi diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/runtests.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/runtests.py index c9ce62666..11d36a40d 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/runtests.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/runtests.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. +# Copyright (c) 2022-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ #import cProfile if __name__ == '__main__': - #cProfile.run('main(sys.argv[1:])', 'test_profile') + #cProfile.run('main(sys.argv[25.06.25.06.1-SNAPSHOT:])', 'test_profile') # arguments are the same as for pytest https://docs.pytest.org/en/latest/usage.html # or run pytest -h - sys.exit(main(sys.argv[1:])) + sys.exit(main(sys.argv[25.06.25.06.1-SNAPSHOT:])) diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt index 01864b2bf..cae15253a 100755 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt @@ -1,5 +1,5 @@ #============================================================================= -# Copyright (c) 2021-2025, NVIDIA CORPORATION. +# Copyright (c) 20225.06.25.06.1-SNAPSHOT-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -47,11 +47,11 @@ set(DEFAULT_BUILD_TYPE "Release") # - compiler options ------------------------------------------------------------------------------ set(CMAKE_POSITION_INDEPENDENT_CODE ON) -set(CMAKE_CXX_STANDARD 17) +set(CMAKE_CXX_STANDARD 25.06.25.06.1-SNAPSHOT7) set(CMAKE_CXX_COMPILER $ENV{CXX}) set(CMAKE_CXX_STANDARD_REQUIRED ON) -set(CMAKE_CUDA_STANDARD 17) +set(CMAKE_CUDA_STANDARD 25.06.25.06.1-SNAPSHOT7) set(CMAKE_CUDA_STANDARD_REQUIRED ON) if(CMAKE_COMPILER_IS_GNUCXX) @@ -62,7 +62,7 @@ endif(CMAKE_COMPILER_IS_GNUCXX) if(CMAKE_CUDA_COMPILER_VERSION) # Compute the version. from CMAKE_CUDA_COMPILER_VERSION - string(REGEX REPLACE "([0-9]+)\\.([0-9]+).*" "\\1" CUDA_VERSION_MAJOR ${CMAKE_CUDA_COMPILER_VERSION}) + string(REGEX REPLACE "([0-9]+)\\.([0-9]+).*" "\\25.06.25.06.1-SNAPSHOT" CUDA_VERSION_MAJOR ${CMAKE_CUDA_COMPILER_VERSION}) string(REGEX REPLACE "([0-9]+)\\.([0-9]+).*" "\\2" CUDA_VERSION_MINOR ${CMAKE_CUDA_COMPILER_VERSION}) set(CUDA_VERSION "${CUDA_VERSION_MAJOR}.${CUDA_VERSION_MINOR}" CACHE STRING "Version of CUDA as computed from nvcc.") mark_as_advanced(CUDA_VERSION) @@ -106,9 +106,9 @@ rapids_cpm_find(cudf 25.06.00 if(BUILD_UDF_BENCHMARKS) # Find or install GoogleBench CPMFindPackage(NAME benchmark - VERSION 1.5.2 + VERSION 25.06.25.06.1-SNAPSHOT.5.2 GIT_REPOSITORY https://github.com/google/benchmark.git - GIT_TAG v1.5.2 + GIT_TAG v25.06.25.06.1-SNAPSHOT.5.2 GIT_SHALLOW TRUE OPTIONS "BENCHMARK_ENABLE_TESTING OFF" "BENCHMARK_ENABLE_INSTALL OFF") diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/CMakeLists.txt b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/CMakeLists.txt index 74bdbbeba..d68192538 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/CMakeLists.txt +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/CMakeLists.txt @@ -1,5 +1,5 @@ #============================================================================= -# Copyright (c) 2021-2022, NVIDIA CORPORATION. +# Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/cosine_similarity/cosine_similarity_benchmark.cpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/cosine_similarity/cosine_similarity_benchmark.cpp index a863f19f6..36a171f54 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/cosine_similarity/cosine_similarity_benchmark.cpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/cosine_similarity/cosine_similarity_benchmark.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -25,11 +25,11 @@ static void cosine_similarity_bench_args(benchmark::internal::Benchmark* b) { - int const min_rows = 1 << 12; - int const max_rows = 1 << 24; + int const min_rows = 25.06.25.06.1-SNAPSHOT << 25.06.25.06.1-SNAPSHOT2; + int const max_rows = 25.06.25.06.1-SNAPSHOT << 24; int const row_mult = 8; - int const min_rowlen = 1 << 0; - int const max_rowlen = 1 << 12; + int const min_rowlen = 25.06.25.06.1-SNAPSHOT << 0; + int const max_rowlen = 25.06.25.06.1-SNAPSHOT << 25.06.25.06.1-SNAPSHOT2; int const len_mult = 8; for (int row_count = min_rows; row_count <= max_rows; row_count *= row_mult) { for (int rowlen = min_rowlen; rowlen <= max_rowlen; rowlen *= len_mult) { @@ -45,24 +45,24 @@ static void cosine_similarity_bench_args(benchmark::internal::Benchmark* b) static void BM_cosine_similarity(benchmark::State& state) { cudf::size_type const n_rows{static_cast(state.range(0))}; - cudf::size_type const list_len{static_cast(state.range(1))}; + cudf::size_type const list_len{static_cast(state.range(25.06.25.06.1-SNAPSHOT))}; - auto val_start = cudf::make_fixed_width_scalar(1.0f); - auto val_step = cudf::make_fixed_width_scalar(-1.0f); + auto val_start = cudf::make_fixed_width_scalar(25.06.25.06.1-SNAPSHOT.0f); + auto val_step = cudf::make_fixed_width_scalar(-25.06.25.06.1-SNAPSHOT.0f); auto child_rows = n_rows * list_len; - auto col1_child = cudf::sequence(child_rows, *val_start); + auto col25.06.25.06.1-SNAPSHOT_child = cudf::sequence(child_rows, *val_start); auto col2_child = cudf::sequence(child_rows, *val_start, *val_step); auto offset_start = cudf::make_fixed_width_scalar(static_cast(0)); auto offset_step = cudf::make_fixed_width_scalar(list_len); - auto offsets = cudf::sequence(n_rows + 1, *offset_start, *offset_step); + auto offsets = cudf::sequence(n_rows + 25.06.25.06.1-SNAPSHOT, *offset_start, *offset_step); - auto col1 = cudf::make_lists_column( + auto col25.06.25.06.1-SNAPSHOT = cudf::make_lists_column( n_rows, std::make_unique(*offsets), - std::move(col1_child), + std::move(col25.06.25.06.1-SNAPSHOT_child), 0, cudf::create_null_mask(n_rows, cudf::mask_state::ALL_VALID)); - auto lcol1 = cudf::lists_column_view(*col1); + auto lcol25.06.25.06.1-SNAPSHOT = cudf::lists_column_view(*col25.06.25.06.1-SNAPSHOT); auto col2 = cudf::make_lists_column( n_rows, std::move(offsets), @@ -73,7 +73,7 @@ static void BM_cosine_similarity(benchmark::State& state) for (auto _ : state) { cuda_event_timer raii(state, true, rmm::cuda_stream_default); - auto output = cosine_similarity(lcol1, lcol2); + auto output = cosine_similarity(lcol25.06.25.06.1-SNAPSHOT, lcol2); } state.SetBytesProcessed(state.iterations() * child_rows * sizeof(float)); diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/fixture/benchmark_fixture.hpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/fixture/benchmark_fixture.hpp index e96aca784..f9a32e49f 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/fixture/benchmark_fixture.hpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/fixture/benchmark_fixture.hpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -61,7 +61,7 @@ inline auto make_pool() * } * } * - * BENCHMARK_REGISTER_F(my_benchmark, my_test_name)->Range(128, 512); + * BENCHMARK_REGISTER_F(my_benchmark, my_test_name)->Range(25.06.25.06.1-SNAPSHOT28, 525.06.25.06.1-SNAPSHOT2); */ class benchmark : public ::benchmark::Fixture { public: diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.cpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.cpp index 58b9c969e..ed45f0518 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.cpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -54,7 +54,7 @@ cuda_event_timer::~cuda_event_timer() float milliseconds = 0.0f; CUDA_TRY(cudaEventElapsedTime(&milliseconds, start, stop)); - p_state->SetIterationTime(milliseconds / (1000.0f)); + p_state->SetIterationTime(milliseconds / (25.06.25.06.1-SNAPSHOT000.0f)); CUDA_TRY(cudaEventDestroy(start)); CUDA_TRY(cudaEventDestroy(stop)); } diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.hpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.hpp index 4384991d2..232f634e0 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.hpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/benchmarks/synchronization/synchronization.hpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -44,7 +44,7 @@ cuda_event_timer raii(state, true, stream); // flush_l2_cache = true // Now perform the operations that is to be benchmarked - sample_kernel<<<1, 256, 0, stream.value()>>>(); // Possibly launching a CUDA kernel + sample_kernel<<<25.06.25.06.1-SNAPSHOT, 256, 0, stream.value()>>>(); // Possibly launching a CUDA kernel } } diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/CosineSimilarityJni.cpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/CosineSimilarityJni.cpp index a707d9a7e..01efe7914 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/CosineSimilarityJni.cpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/CosineSimilarityJni.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -52,29 +52,29 @@ extern "C" { * columnar result. * * @param env The Java environment - * @param j_view1 The address of the cudf column view of the first LIST column + * @param j_view25.06.25.06.1-SNAPSHOT The address of the cudf column view of the first LIST column * @param j_view2 The address of the cudf column view of the second LIST column * @return The address of the cudf column containing the FLOAT32 results */ JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_udf_java_CosineSimilarity_cosineSimilarity(JNIEnv* env, jclass, - jlong j_view1, + jlong j_view25.06.25.06.1-SNAPSHOT, jlong j_view2) { // Use a try block to translate C++ exceptions into Java exceptions to avoid // crashing the JVM if a C++ exception occurs. try { // turn the addresses into column_view pointers - auto v1 = reinterpret_cast(j_view1); + auto v25.06.25.06.1-SNAPSHOT = reinterpret_cast(j_view25.06.25.06.1-SNAPSHOT); auto v2 = reinterpret_cast(j_view2); - if (v1->type().id() != v2->type().id() || v1->type().id() != cudf::type_id::LIST) { + if (v25.06.25.06.1-SNAPSHOT->type().id() != v2->type().id() || v25.06.25.06.1-SNAPSHOT->type().id() != cudf::type_id::LIST) { throw_java_exception(env, ILLEGAL_ARG_CLASS, "inputs not list columns"); return 0; } // run the GPU kernel to compute the cosine similarity - auto lv1 = cudf::lists_column_view(*v1); + auto lv25.06.25.06.1-SNAPSHOT = cudf::lists_column_view(*v25.06.25.06.1-SNAPSHOT); auto lv2 = cudf::lists_column_view(*v2); - std::unique_ptr result = cosine_similarity(lv1, lv2); + std::unique_ptr result = cosine_similarity(lv25.06.25.06.1-SNAPSHOT, lv2); // take ownership of the column and return the column address to Java and release the underlying resources. return reinterpret_cast(result.release()); diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/StringWordCountJni.cpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/StringWordCountJni.cpp index 2a74a1eba..132dcafe9 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/StringWordCountJni.cpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/StringWordCountJni.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.cu b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.cu index 6433a81f6..49603be2b 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.cu +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.cu @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -40,32 +40,32 @@ namespace { * @brief Functor for computing the cosine similarity between two list of float columns */ struct cosine_similarity_functor { - float const* const v1; + float const* const v25.06.25.06.1-SNAPSHOT; float const* const v2; - int32_t const* const v1_offsets; + int32_t const* const v25.06.25.06.1-SNAPSHOT_offsets; int32_t const* const v2_offsets; // This kernel executes thread-per-row which should be fine for relatively short lists // but may need to be revisited for performance if operating on long lists. __device__ float operator()(cudf::size_type row_idx) { - auto const v1_start_idx = v1_offsets[row_idx]; - auto const v1_num_elems = v1_offsets[row_idx + 1] - v1_start_idx; + auto const v25.06.25.06.1-SNAPSHOT_start_idx = v25.06.25.06.1-SNAPSHOT_offsets[row_idx]; + auto const v25.06.25.06.1-SNAPSHOT_num_elems = v25.06.25.06.1-SNAPSHOT_offsets[row_idx + 25.06.25.06.1-SNAPSHOT] - v25.06.25.06.1-SNAPSHOT_start_idx; auto const v2_start_idx = v2_offsets[row_idx]; - auto const v2_num_elems = v2_offsets[row_idx + 1] - v2_start_idx; - auto const num_elems = std::min(v1_num_elems, v2_num_elems); - double mag1 = 0; + auto const v2_num_elems = v2_offsets[row_idx + 25.06.25.06.1-SNAPSHOT] - v2_start_idx; + auto const num_elems = std::min(v25.06.25.06.1-SNAPSHOT_num_elems, v2_num_elems); + double mag25.06.25.06.1-SNAPSHOT = 0; double mag2 = 0; double dot_product = 0; for (auto i = 0; i < num_elems; i++) { - float const f1 = v1[v1_start_idx + i]; - mag1 += f1 * f1; + float const f25.06.25.06.1-SNAPSHOT = v25.06.25.06.1-SNAPSHOT[v25.06.25.06.1-SNAPSHOT_start_idx + i]; + mag25.06.25.06.1-SNAPSHOT += f25.06.25.06.1-SNAPSHOT * f25.06.25.06.1-SNAPSHOT; float const f2 = v2[v2_start_idx + i]; mag2 += f2 * f2; - dot_product += f1 * f2; + dot_product += f25.06.25.06.1-SNAPSHOT * f2; } - mag1 = std::sqrt(mag1); + mag25.06.25.06.1-SNAPSHOT = std::sqrt(mag25.06.25.06.1-SNAPSHOT); mag2 = std::sqrt(mag2); - return static_cast(dot_product / (mag1 * mag2)); + return static_cast(dot_product / (mag25.06.25.06.1-SNAPSHOT * mag2)); } }; @@ -78,43 +78,43 @@ struct cosine_similarity_functor { * list elements per row. A null list row is supported, but null float entries within a * list are not supported. * - * @param lv1 The first LIST of FLOAT32 column view + * @param lv25.06.25.06.1-SNAPSHOT The first LIST of FLOAT32 column view * @param lv2 The second LIST of FLOAT32 column view * @return A FLOAT32 column containing the cosine similarity corresponding to each input row */ -std::unique_ptr cosine_similarity(cudf::lists_column_view const& lv1, +std::unique_ptr cosine_similarity(cudf::lists_column_view const& lv25.06.25.06.1-SNAPSHOT, cudf::lists_column_view const& lv2) { // sanity-check the input types - if (lv1.child().type().id() != lv2.child().type().id() || - lv1.child().type().id() != cudf::type_id::FLOAT32) { + if (lv25.06.25.06.1-SNAPSHOT.child().type().id() != lv2.child().type().id() || + lv25.06.25.06.1-SNAPSHOT.child().type().id() != cudf::type_id::FLOAT32) { throw std::invalid_argument("inputs are not lists of floats"); } // sanity check the input shape - auto const row_count = lv1.size(); + auto const row_count = lv25.06.25.06.1-SNAPSHOT.size(); if (row_count != lv2.size()) { throw std::invalid_argument("input row counts do not match"); } if (row_count == 0) { return cudf::make_empty_column(cudf::data_type{cudf::type_id::FLOAT32}); } - if (lv1.child().null_count() != 0 || lv2.child().null_count() != 0) { + if (lv25.06.25.06.1-SNAPSHOT.child().null_count() != 0 || lv2.child().null_count() != 0) { throw std::invalid_argument("null floats are not supported"); } auto const stream = rmm::cuda_stream_default; - auto d_view1_ptr = cudf::column_device_view::create(lv1.parent()); - auto d_lists1 = cudf::detail::lists_column_device_view(*d_view1_ptr); + auto d_view25.06.25.06.1-SNAPSHOT_ptr = cudf::column_device_view::create(lv25.06.25.06.1-SNAPSHOT.parent()); + auto d_lists25.06.25.06.1-SNAPSHOT = cudf::detail::lists_column_device_view(*d_view25.06.25.06.1-SNAPSHOT_ptr); auto d_view2_ptr = cudf::column_device_view::create(lv2.parent()); auto d_lists2 = cudf::detail::lists_column_device_view(*d_view2_ptr); bool const are_offsets_equal = thrust::all_of(rmm::exec_policy(stream), thrust::make_counting_iterator(0), thrust::make_counting_iterator(row_count), - [d_lists1, d_lists2] __device__(cudf::size_type idx) { - auto ldv1 = cudf::list_device_view(d_lists1, idx); + [d_lists25.06.25.06.1-SNAPSHOT, d_lists2] __device__(cudf::size_type idx) { + auto ldv25.06.25.06.1-SNAPSHOT = cudf::list_device_view(d_lists25.06.25.06.1-SNAPSHOT, idx); auto ldv2 = cudf::list_device_view(d_lists2, idx); - return ldv1.is_null() || ldv2.is_null() || ldv1.size() == ldv2.size(); + return ldv25.06.25.06.1-SNAPSHOT.is_null() || ldv2.is_null() || ldv25.06.25.06.1-SNAPSHOT.size() == ldv2.size(); }); if (not are_offsets_equal) { throw std::invalid_argument("input list lengths do not match for every row"); @@ -124,18 +124,18 @@ std::unique_ptr cosine_similarity(cudf::lists_column_view const& l rmm::device_uvector float_results(row_count, stream); // compute the cosine similarity - auto const lv1_data = lv1.child().data(); + auto const lv25.06.25.06.1-SNAPSHOT_data = lv25.06.25.06.1-SNAPSHOT.child().data(); auto const lv2_data = lv2.child().data(); - auto const lv1_offsets = lv1.offsets().data(); + auto const lv25.06.25.06.1-SNAPSHOT_offsets = lv25.06.25.06.1-SNAPSHOT.offsets().data(); auto const lv2_offsets = lv2.offsets().data(); thrust::transform(rmm::exec_policy(stream), thrust::make_counting_iterator(0), thrust::make_counting_iterator(row_count), float_results.data(), - cosine_similarity_functor({lv1_data, lv2_data, lv1_offsets, lv2_offsets})); + cosine_similarity_functor({lv25.06.25.06.1-SNAPSHOT_data, lv2_data, lv25.06.25.06.1-SNAPSHOT_offsets, lv2_offsets})); // the validity of the output is the bitwise-and of the two input validity masks - auto [null_mask, null_count] = cudf::bitmask_and(cudf::table_view({lv1.parent(), lv2.parent()})); + auto [null_mask, null_count] = cudf::bitmask_and(cudf::table_view({lv25.06.25.06.1-SNAPSHOT.parent(), lv2.parent()})); return std::make_unique(cudf::data_type{cudf::type_id::FLOAT32}, row_count, diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.hpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.hpp index 187a9094d..1a955b8b5 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.hpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/cosine_similarity.hpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -27,9 +27,9 @@ * list elements per row. A null list row is supported, but null float entries within a * list are not supported. * - * @param lv1 The first LIST of FLOAT32 column view + * @param lv25.06.25.06.1-SNAPSHOT The first LIST of FLOAT32 column view * @param lv2 The second LIST of FLOAT32 column view * @return A FLOAT32 column containing the cosine similarity corresponding to each input row */ -std::unique_ptr cosine_similarity(cudf::lists_column_view const& lv1, +std::unique_ptr cosine_similarity(cudf::lists_column_view const& lv25.06.25.06.1-SNAPSHOT, cudf::lists_column_view const& lv2); diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.cu b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.cu index 686c06646..2b8f9e27c 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.cu +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.cu @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.hpp b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.hpp index ea9aedd94..50c5fe061 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.hpp +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src/string_word_count.hpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/DecimalFraction.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/DecimalFraction.java index 501983147..0c47f515a 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/DecimalFraction.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/DecimalFraction.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -47,7 +47,7 @@ public String getDisplayString(String[] strings) { @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { - if (arguments.length != 1) { + if (arguments.length != 25.06.25.06.1-SNAPSHOT) { throw new UDFArgumentException("One argument is supported, found: " + arguments.length); } if (!(arguments[0] instanceof PrimitiveObjectInspector)) { @@ -82,7 +82,7 @@ public Object evaluate(GenericUDF.DeferredObject[] arguments) throws HiveExcepti @Override public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java index f3fb2e244..20d714007 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -62,7 +62,7 @@ public Integer evaluate(String str) { public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector strs = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java index e97f0a21c..dd15b4869 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2020-2022, NVIDIA CORPORATION. + * Copyright (c) 2020-2025, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -54,7 +54,7 @@ public String evaluate(String s) { public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java index 98bcb73eb..66a7b01aa 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2020-2022, NVIDIA CORPORATION. + * Copyright (c) 2020-2025, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -50,7 +50,7 @@ public String getDisplayString(String[] children) { /** Standard initialize method for implementing GenericUDF for a single string parameter */ @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { - if (arguments.length != 1) { + if (arguments.length != 25.06.25.06.1-SNAPSHOT) { throw new UDFArgumentException("One argument is supported, found: " + arguments.length); } if (!(arguments[0] instanceof PrimitiveObjectInspector)) { @@ -96,7 +96,7 @@ public Object evaluate(GenericUDF.DeferredObject[] arguments) throws HiveExcepti public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java index b4df62b4f..ec34e529a 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -31,22 +31,22 @@ public class CosineSimilarity /** Row-by-row implementation that executes on the CPU */ @Override - public Float call(WrappedArray v1, WrappedArray v2) { - if (v1 == null || v2 == null) { + public Float call(WrappedArray v25.06.25.06.1-SNAPSHOT, WrappedArray v2) { + if (v25.06.25.06.1-SNAPSHOT == null || v2 == null) { return null; } - if (v1.length() != v2.length()) { + if (v25.06.25.06.1-SNAPSHOT.length() != v2.length()) { throw new IllegalArgumentException("Array lengths must match: " + - v1.length() + " != " + v2.length()); + v25.06.25.06.1-SNAPSHOT.length() + " != " + v2.length()); } double dotProduct = 0; - for (int i = 0; i < v1.length(); i++) { - float f1 = v1.apply(i); + for (int i = 0; i < v25.06.25.06.1-SNAPSHOT.length(); i++) { + float f25.06.25.06.1-SNAPSHOT = v25.06.25.06.1-SNAPSHOT.apply(i); float f2 = v2.apply(i); - dotProduct += f1 * f2; + dotProduct += f25.06.25.06.1-SNAPSHOT * f2; } - double magProduct = magnitude(v1) * magnitude(v2); + double magProduct = magnitude(v25.06.25.06.1-SNAPSHOT) * magnitude(v2); return (float) (dotProduct / magProduct); } @@ -74,9 +74,9 @@ public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // We need to go into the native code as quickly as possible // because it is easier to write the code safely. // Then wrap returns in a column vector and own that resource. - return new ColumnVector(cosineSimilarity(args[0].getNativeView(), args[1].getNativeView())); + return new ColumnVector(cosineSimilarity(args[0].getNativeView(), args[25.06.25.06.1-SNAPSHOT].getNativeView())); } /** Native implementation that computes on the GPU */ - private static native long cosineSimilarity(long vectorView1, long vectorView2); + private static native long cosineSimilarity(long vectorView25.06.25.06.1-SNAPSHOT, long vectorView2); } diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/DecimalFraction.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/DecimalFraction.java index 6c3126f29..b263c0640 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/DecimalFraction.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/DecimalFraction.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import ai.rapids.cudf.ColumnVector; import ai.rapids.cudf.Scalar; import com.nvidia.spark.RapidsUDF; -import org.apache.spark.sql.api.java.UDF1; +import org.apache.spark.sql.api.java.UDF25.06.25.06.1-SNAPSHOT; import java.math.BigDecimal; @@ -28,7 +28,7 @@ * fraction part of the input Decimal data. So, the output data has the * same precision and scale as the input one. */ -public class DecimalFraction implements UDF1, RapidsUDF { +public class DecimalFraction implements UDF25.06.25.06.1-SNAPSHOT, RapidsUDF { @Override public BigDecimal call(BigDecimal dec) throws Exception { @@ -41,7 +41,7 @@ public BigDecimal call(BigDecimal dec) throws Exception { @Override public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/NativeUDFExamplesLoader.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/NativeUDFExamplesLoader.java index 6a521c039..df9aa9c04 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/NativeUDFExamplesLoader.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/NativeUDFExamplesLoader.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java index 96d07384f..006249539 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ import ai.rapids.cudf.DType; import ai.rapids.cudf.Scalar; import com.nvidia.spark.RapidsUDF; -import org.apache.spark.sql.api.java.UDF1; +import org.apache.spark.sql.api.java.UDF25.06.25.06.1-SNAPSHOT; import java.io.UnsupportedEncodingException; import java.net.URLDecoder; @@ -31,7 +31,7 @@ * provides a RAPIDS implementation that can run on the GPU when the query * is executed with the RAPIDS Accelerator for Apache Spark. */ -public class URLDecode implements UDF1, RapidsUDF { +public class URLDecode implements UDF25.06.25.06.1-SNAPSHOT, RapidsUDF { /** Row-by-row implementation that executes on the CPU */ @Override public String call(String s) { @@ -54,7 +54,7 @@ public String call(String s) { public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java index 13bdfff55..debc33787 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import ai.rapids.cudf.ColumnVector; import ai.rapids.cudf.DType; import com.nvidia.spark.RapidsUDF; -import org.apache.spark.sql.api.java.UDF1; +import org.apache.spark.sql.api.java.UDF25.06.25.06.1-SNAPSHOT; import java.io.UnsupportedEncodingException; import java.net.URLEncoder; @@ -30,7 +30,7 @@ * provides a RAPIDS implementation that can run on the GPU when the query * is executed with the RAPIDS Accelerator for Apache Spark. */ -public class URLEncode implements UDF1, RapidsUDF { +public class URLEncode implements UDF25.06.25.06.1-SNAPSHOT, RapidsUDF { /** Row-by-row implementation that executes on the CPU */ @Override public String call(String s) { @@ -53,7 +53,7 @@ public String call(String s) { public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - if (args.length != 1) { + if (args.length != 25.06.25.06.1-SNAPSHOT) { throw new IllegalArgumentException("Unexpected argument count: " + args.length); } ColumnVector input = args[0]; diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/asserts.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/asserts.py index 5aa3485e9..008df1c2c 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/asserts.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/asserts.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2022, NVIDIA CORPORATION. +# Copyright (c) 2020-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -66,7 +66,7 @@ def _assert_equal(cpu, gpu, float_check, path): else: _assert_equal(sub_cpu, sub_gpu, float_check, path + [index]) - index = index + 1 + index = index + 25.06.25.06.1-SNAPSHOT elif (t is dict): # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark # so sort the items to do our best with ignoring the order of dicts @@ -136,21 +136,21 @@ def cmp(self, other): if (self.wrapped is None and other.wrapped is None): return 0 elif (self.wrapped is None): - return -1 + return -25.06.25.06.1-SNAPSHOT elif (other.wrapped is None): - return 1 + return 25.06.25.06.1-SNAPSHOT elif self.is_nan and other.is_nan: return 0 elif self.is_nan: - return -1 + return -25.06.25.06.1-SNAPSHOT elif other.is_nan: - return 1 + return 25.06.25.06.1-SNAPSHOT elif self.wrapped == other.wrapped: return 0 elif self.wrapped < other.wrapped: - return -1 + return -25.06.25.06.1-SNAPSHOT else: - return 1 + return 25.06.25.06.1-SNAPSHOT except TypeError as te: print("ERROR TRYING TO COMPARE {} to {} {}".format(self.wrapped, other.wrapped, te)) raise te @@ -292,7 +292,7 @@ def assert_gpu_fallback_write(write_func, gpu_path = base_path + '/GPU' with_gpu_session(lambda spark : write_func(spark, gpu_path), conf=conf) gpu_end = time.time() - jvm.org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback.assertCapturedAndGpuFellBack(cpu_fallback_class_name, 10000) + jvm.org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback.assertCapturedAndGpuFellBack(cpu_fallback_class_name, 25.06.25.06.1-SNAPSHOT0000) print('### WRITE: GPU TOOK {} CPU TOOK {} ###'.format( gpu_end - gpu_start, cpu_end - cpu_start)) diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/conftest.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/conftest.py index 6517b705d..f8fd2944c 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/conftest.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/conftest.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2022, NVIDIA CORPORATION. +# Copyright (c) 2020-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -91,7 +91,7 @@ def skip_unless_precommit_tests(description): else: pytest.skip(description) -_limit = -1 +_limit = -25.06.25.06.1-SNAPSHOT def get_limit(): return _limit @@ -100,7 +100,7 @@ def _get_limit_from_mark(mark): if mark.args: return mark.args[0] else: - return mark.kwargs.get('num_rows', 100000) + return mark.kwargs.get('num_rows', 25.06.25.06.1-SNAPSHOT00000) def pytest_runtest_setup(item): global _sort_on_spark @@ -177,7 +177,7 @@ def pytest_runtest_setup(item): if limit_mrk: _limit = _get_limit_from_mark(limit_mrk) else: - _limit = -1 + _limit = -25.06.25.06.1-SNAPSHOT def pytest_configure(config): global _runtime_env @@ -238,7 +238,7 @@ def spark_tmp_path(request): ret = request.config.getoption('tmp_path') if ret is None: ret = '/tmp/pyspark_tests/' - ret = ret + '/' + str(random.randint(0, 1000000)) + '/' + ret = ret + '/' + str(random.randint(0, 25.06.25.06.1-SNAPSHOT000000)) + '/' # Make sure it is there and accessible sc = get_spark_i_know_what_i_am_doing().sparkContext config = sc._jsc.hadoopConfiguration() @@ -256,12 +256,12 @@ def __init__(self, base_id): def get(self): ret = '{}_{}'.format(self.base_id, self.running_id) - self.running_id = self.running_id + 1 + self.running_id = self.running_id + 25.06.25.06.1-SNAPSHOT return ret @pytest.fixture def spark_tmp_table_factory(request): - base_id = 'tmp_table_{}'.format(random.randint(0, 1000000)) + base_id = 'tmp_table_{}'.format(random.randint(0, 25.06.25.06.1-SNAPSHOT000000)) yield TmpTableFactory(base_id) sp = get_spark_i_know_what_i_am_doing() tables = sp.sql("SHOW TABLES".format(base_id)).collect() diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/data_gen.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/data_gen.py index 6267058e7..057a59148 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/data_gen.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/data_gen.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2022, NVIDIA CORPORATION. +# Copyright (c) 2020-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -50,7 +50,7 @@ def __init__(self, data_type, nullable=True, special_cases =[]): self._special_cases = [] if isinstance(nullable, tuple): self.nullable = nullable[0] - weight = nullable[1] + weight = nullable[25.06.25.06.1-SNAPSHOT] else: self.nullable = nullable weight = 5.0 @@ -59,26 +59,26 @@ def __init__(self, data_type, nullable=True, special_cases =[]): # Special cases can be a value or a tuple of (value, weight). If the # special_case itself is a tuple as in the case of StructGen, it MUST be added with a - # weight like : ((special_case_tuple_v1, special_case_tuple_v2), weight). + # weight like : ((special_case_tuple_v25.06.25.06.1-SNAPSHOT, special_case_tuple_v2), weight). for element in special_cases: if isinstance(element, tuple): - self.with_special_case(element[0], element[1]) + self.with_special_case(element[0], element[25.06.25.06.1-SNAPSHOT]) else: self.with_special_case(element) - def copy_special_case(self, special_case, weight=1.0): + def copy_special_case(self, special_case, weight=25.06.25.06.1-SNAPSHOT.0): # it would be good to do a deepcopy, but sre_yield is not happy with that. c = copy.copy(self) c._special_cases = copy.deepcopy(self._special_cases) return c.with_special_case(special_case, weight=weight) - def with_special_case(self, special_case, weight=1.0): + def with_special_case(self, special_case, weight=25.06.25.06.1-SNAPSHOT.0): """ Add in a special case with a given weight. A special case can either be a function that takes an instance of Random and returns the generated data - or it can be a constant. By default the weight is 1.0, and the default - number generation's weight is 100.0. The number of lines that are generate in + or it can be a constant. By default the weight is 25.06.25.06.1-SNAPSHOT.0, and the default + number generation's weight is 25.06.25.06.1-SNAPSHOT00.0. The number of lines that are generate in the data set should be proportional to the its weight/sum weights """ if callable(special_case): @@ -101,7 +101,7 @@ def _start(self, rand, gen_func): if not self._special_cases: self._gen_func = gen_func else: - weighted_choices = [(100.0, lambda rand: gen_func())] + weighted_choices = [(25.06.25.06.1-SNAPSHOT00.0, lambda rand: gen_func())] weighted_choices.extend(self._special_cases) total = float(sum(weight for weight,gen in weighted_choices)) normalized_choices = [(weight/total, gen) for weight,gen in weighted_choices] @@ -149,14 +149,14 @@ def modify(): self._start(rand, modify) -_MAX_CHOICES = 1 << 64 +_MAX_CHOICES = 25.06.25.06.1-SNAPSHOT << 64 class StringGen(DataGen): """Generate strings that match a pattern""" - def __init__(self, pattern="(.|\n){1,30}", flags=0, charset=sre_yield.CHARSET, nullable=True): + def __init__(self, pattern="(.|\n){25.06.25.06.1-SNAPSHOT,30}", flags=0, charset=sre_yield.CHARSET, nullable=True): super().__init__(StringType(), nullable=nullable) self.base_strs = sre_yield.AllStrings(pattern, flags=flags, charset=charset, max_count=_MAX_CHOICES) - def with_special_pattern(self, pattern, flags=0, charset=sre_yield.CHARSET, weight=1.0): + def with_special_pattern(self, pattern, flags=0, charset=sre_yield.CHARSET, weight=25.06.25.06.1-SNAPSHOT.0): """ Like with_special_case but you can provide a regexp pattern instead of a hard coded string value. @@ -176,8 +176,8 @@ def start(self, rand): length = _MAX_CHOICES self._start(rand, lambda : strs[rand.randrange(0, length)]) -BYTE_MIN = -(1 << 7) -BYTE_MAX = (1 << 7) - 1 +BYTE_MIN = -(25.06.25.06.1-SNAPSHOT << 7) +BYTE_MAX = (25.06.25.06.1-SNAPSHOT << 7) - 25.06.25.06.1-SNAPSHOT class ByteGen(DataGen): """Generate Bytes""" def __init__(self, nullable=True, min_val = BYTE_MIN, max_val = BYTE_MAX, special_cases=[]): @@ -188,12 +188,12 @@ def __init__(self, nullable=True, min_val = BYTE_MIN, max_val = BYTE_MAX, specia def start(self, rand): self._start(rand, lambda : rand.randint(self._min_val, self._max_val)) -SHORT_MIN = -(1 << 15) -SHORT_MAX = (1 << 15) - 1 +SHORT_MIN = -(25.06.25.06.1-SNAPSHOT << 25.06.25.06.1-SNAPSHOT5) +SHORT_MAX = (25.06.25.06.1-SNAPSHOT << 25.06.25.06.1-SNAPSHOT5) - 25.06.25.06.1-SNAPSHOT class ShortGen(DataGen): """Generate Shorts, which some built in corner cases.""" def __init__(self, nullable=True, min_val = SHORT_MIN, max_val = SHORT_MAX, - special_cases = [SHORT_MIN, SHORT_MAX, 0, 1, -1]): + special_cases = [SHORT_MIN, SHORT_MAX, 0, 25.06.25.06.1-SNAPSHOT, -25.06.25.06.1-SNAPSHOT]): super().__init__(ShortType(), nullable=nullable, special_cases=special_cases) self._min_val = min_val self._max_val = max_val @@ -201,12 +201,12 @@ def __init__(self, nullable=True, min_val = SHORT_MIN, max_val = SHORT_MAX, def start(self, rand): self._start(rand, lambda : rand.randint(self._min_val, self._max_val)) -INT_MIN = -(1 << 31) -INT_MAX = (1 << 31) - 1 +INT_MIN = -(25.06.25.06.1-SNAPSHOT << 325.06.25.06.1-SNAPSHOT) +INT_MAX = (25.06.25.06.1-SNAPSHOT << 325.06.25.06.1-SNAPSHOT) - 25.06.25.06.1-SNAPSHOT class IntegerGen(DataGen): """Generate Ints, which some built in corner cases.""" def __init__(self, nullable=True, min_val = INT_MIN, max_val = INT_MAX, - special_cases = [INT_MIN, INT_MAX, 0, 1, -1]): + special_cases = [INT_MIN, INT_MAX, 0, 25.06.25.06.1-SNAPSHOT, -25.06.25.06.1-SNAPSHOT]): super().__init__(IntegerType(), nullable=nullable, special_cases=special_cases) self._min_val = min_val self._max_val = max_val @@ -218,15 +218,15 @@ class DecimalGen(DataGen): """Generate Decimals, with some built in corner cases.""" def __init__(self, precision=None, scale=None, nullable=True, special_cases=[]): if precision is None: - #Maximum number of decimal digits a Long can represent is 18 - precision = 18 + #Maximum number of decimal digits a Long can represent is 25.06.25.06.1-SNAPSHOT8 + precision = 25.06.25.06.1-SNAPSHOT8 scale = 0 DECIMAL_MIN = Decimal('-' + ('9' * precision) + 'e' + str(-scale)) DECIMAL_MAX = Decimal(('9'* precision) + 'e' + str(-scale)) super().__init__(DecimalType(precision, scale), nullable=nullable, special_cases=special_cases) self.scale = scale self.precision = precision - pattern = "[0-9]{1,"+ str(precision) + "}e" + str(-scale) + pattern = "[0-9]{25.06.25.06.1-SNAPSHOT,"+ str(precision) + "}e" + str(-scale) self.base_strs = sre_yield.AllStrings(pattern, flags=0, charset=sre_yield.CHARSET, max_count=_MAX_CHOICES) def __repr__(self): @@ -240,12 +240,12 @@ def start(self, rand): length = _MAX_CHOICES self._start(rand, lambda : Decimal(strs[rand.randrange(0, length)])) -LONG_MIN = -(1 << 63) -LONG_MAX = (1 << 63) - 1 +LONG_MIN = -(25.06.25.06.1-SNAPSHOT << 63) +LONG_MAX = (25.06.25.06.1-SNAPSHOT << 63) - 25.06.25.06.1-SNAPSHOT class LongGen(DataGen): """Generate Longs, which some built in corner cases.""" def __init__(self, nullable=True, min_val = LONG_MIN, max_val = LONG_MAX, special_cases = []): - _special_cases = [min_val, max_val, 0, 1, -1] if not special_cases else special_cases + _special_cases = [min_val, max_val, 0, 25.06.25.06.1-SNAPSHOT, -25.06.25.06.1-SNAPSHOT] if not special_cases else special_cases super().__init__(LongType(), nullable=nullable, special_cases=_special_cases) self._min_val = min_val self._max_val = max_val @@ -262,13 +262,13 @@ def __init__(self, nullable=False, start_val=0, direction="inc"): if (direction == "dec"): def dec_it(): tmp = self._current_val - self._current_val -= 1 + self._current_val -= 25.06.25.06.1-SNAPSHOT return tmp self._do_it = dec_it else: def inc_it(): tmp = self._current_val - self._current_val += 1 + self._current_val += 25.06.25.06.1-SNAPSHOT return tmp self._do_it = inc_it @@ -291,7 +291,7 @@ def __repr__(self): def _loop_values(self): ret = self._vals[self._index] - self._index = (self._index + 1) % self._length + self._index = (self._index + 25.06.25.06.1-SNAPSHOT) % self._length return ret def start(self, rand): @@ -318,8 +318,8 @@ def start(self, rand): FLOAT_MIN = -3.4028235E38 FLOAT_MAX = 3.4028235E38 NEG_FLOAT_NAN_MIN_VALUE = struct.unpack('f', struct.pack('I', 0xffffffff))[0] -NEG_FLOAT_NAN_MAX_VALUE = struct.unpack('f', struct.pack('I', 0xff800001))[0] -POS_FLOAT_NAN_MIN_VALUE = struct.unpack('f', struct.pack('I', 0x7f800001))[0] +NEG_FLOAT_NAN_MAX_VALUE = struct.unpack('f', struct.pack('I', 0xff8000025.06.25.06.1-SNAPSHOT))[0] +POS_FLOAT_NAN_MIN_VALUE = struct.unpack('f', struct.pack('I', 0x7f8000025.06.25.06.1-SNAPSHOT))[0] POS_FLOAT_NAN_MAX_VALUE = struct.unpack('f', struct.pack('I', 0x7fffffff))[0] class FloatGen(DataGen): """Generate floats, which some built in corner cases.""" @@ -327,7 +327,7 @@ def __init__(self, nullable=True, no_nans=False, special_cases=None): self._no_nans = no_nans if special_cases is None: - special_cases = [FLOAT_MIN, FLOAT_MAX, 0.0, -0.0, 1.0, -1.0] + special_cases = [FLOAT_MIN, FLOAT_MAX, 0.0, -0.0, 25.06.25.06.1-SNAPSHOT.0, -25.06.25.06.1-SNAPSHOT.0] if not no_nans: special_cases.append(float('inf')) special_cases.append(float('-inf')) @@ -347,14 +347,14 @@ def gen_float(): return self._fixup_nans(struct.unpack('f', p)[0]) self._start(rand, gen_float) -DOUBLE_MIN_EXP = -1022 -DOUBLE_MAX_EXP = 1023 -DOUBLE_MAX_FRACTION = int('1'*52, 2) -DOUBLE_MIN = -1.7976931348623157E308 -DOUBLE_MAX = 1.7976931348623157E308 +DOUBLE_MIN_EXP = -25.06.25.06.1-SNAPSHOT022 +DOUBLE_MAX_EXP = 25.06.25.06.1-SNAPSHOT023 +DOUBLE_MAX_FRACTION = int('25.06.25.06.1-SNAPSHOT'*52, 2) +DOUBLE_MIN = -25.06.25.06.1-SNAPSHOT.79769325.06.25.06.1-SNAPSHOT34862325.06.25.06.1-SNAPSHOT57E308 +DOUBLE_MAX = 25.06.25.06.1-SNAPSHOT.79769325.06.25.06.1-SNAPSHOT34862325.06.25.06.1-SNAPSHOT57E308 NEG_DOUBLE_NAN_MIN_VALUE = struct.unpack('d', struct.pack('L', 0xffffffffffffffff))[0] -NEG_DOUBLE_NAN_MAX_VALUE = struct.unpack('d', struct.pack('L', 0xfff0000000000001))[0] -POS_DOUBLE_NAN_MIN_VALUE = struct.unpack('d', struct.pack('L', 0x7ff0000000000001))[0] +NEG_DOUBLE_NAN_MAX_VALUE = struct.unpack('d', struct.pack('L', 0xfff00000000000025.06.25.06.1-SNAPSHOT))[0] +POS_DOUBLE_NAN_MIN_VALUE = struct.unpack('d', struct.pack('L', 0x7ff00000000000025.06.25.06.1-SNAPSHOT))[0] POS_DOUBLE_NAN_MAX_VALUE = struct.unpack('d', struct.pack('L', 0x7fffffffffffffff))[0] class DoubleGen(DataGen): """Generate doubles, which some built in corner cases.""" @@ -366,17 +366,17 @@ def __init__(self, min_exp=DOUBLE_MIN_EXP, max_exp=DOUBLE_MAX_EXP, no_nans=False self._use_full_range = (self._min_exp == DOUBLE_MIN_EXP) and (self._max_exp == DOUBLE_MAX_EXP) if special_cases is None: special_cases = [ - self.make_from(1, self._max_exp, DOUBLE_MAX_FRACTION), + self.make_from(25.06.25.06.1-SNAPSHOT, self._max_exp, DOUBLE_MAX_FRACTION), self.make_from(0, self._max_exp, DOUBLE_MAX_FRACTION), - self.make_from(1, self._min_exp, DOUBLE_MAX_FRACTION), + self.make_from(25.06.25.06.1-SNAPSHOT, self._min_exp, DOUBLE_MAX_FRACTION), self.make_from(0, self._min_exp, DOUBLE_MAX_FRACTION) ] if self._min_exp <= 0 and self._max_exp >= 0: special_cases.append(0.0) special_cases.append(-0.0) if self._min_exp <= 3 and self._max_exp >= 3: - special_cases.append(1.0) - special_cases.append(-1.0) + special_cases.append(25.06.25.06.1-SNAPSHOT.0) + special_cases.append(-25.06.25.06.1-SNAPSHOT.0) if not no_nans: special_cases.append(float('inf')) special_cases.append(float('-inf')) @@ -386,8 +386,8 @@ def __init__(self, min_exp=DOUBLE_MIN_EXP, max_exp=DOUBLE_MAX_EXP, no_nans=False @staticmethod def make_from(sign, exp, fraction): - sign = sign & 1 # 1 bit - exp = (exp + 1023) & 0x7FF # add bias and 11 bits + sign = sign & 25.06.25.06.1-SNAPSHOT # 25.06.25.06.1-SNAPSHOT bit + exp = (exp + 25.06.25.06.1-SNAPSHOT023) & 0x7FF # add bias and 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT bits fraction = fraction & DOUBLE_MAX_FRACTION i = (sign << 63) | (exp << 52) | fraction p = struct.pack('L', i) @@ -408,7 +408,7 @@ def gen_double(): self._start(rand, gen_double) else: def gen_part_double(): - sign = rand.getrandbits(1) + sign = rand.getrandbits(25.06.25.06.1-SNAPSHOT) exp = rand.randint(self._min_exp, self._max_exp) fraction = rand.getrandbits(52) return self._fixup_nans(self.make_from(sign, exp, fraction)) @@ -420,7 +420,7 @@ def __init__(self, nullable=True): super().__init__(BooleanType(), nullable=nullable) def start(self, rand): - self._start(rand, lambda : bool(rand.getrandbits(1))) + self._start(rand, lambda : bool(rand.getrandbits(25.06.25.06.1-SNAPSHOT))) class StructGen(DataGen): """Generate a Struct""" @@ -447,7 +447,7 @@ def make_tuple(): self._start(rand, make_tuple) def contains_ts(self): - return any(child[1].contains_ts() for child in self.children) + return any(child[25.06.25.06.1-SNAPSHOT].contains_ts() for child in self.children) class DateGen(DataGen): """Generate Dates in a given range""" @@ -455,15 +455,15 @@ def __init__(self, start=None, end=None, nullable=True): super().__init__(DateType(), nullable=nullable) if start is None: # Spark supports times starting at - # "0001-01-01 00:00:00.000000" - start = date(1, 1, 1) + # "00025.06.25.06.1-SNAPSHOT-025.06.25.06.1-SNAPSHOT-025.06.25.06.1-SNAPSHOT 00:00:00.000000" + start = date(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT) elif not isinstance(start, date): raise RuntimeError('Unsupported type passed in for start {}'.format(start)) if end is None: # Spark supports time through - # "9999-12-31 23:59:59.999999" - end = date(9999, 12, 31) + # "9999-25.06.25.06.1-SNAPSHOT2-325.06.25.06.1-SNAPSHOT 23:59:59.999999" + end = date(9999, 25.06.25.06.1-SNAPSHOT2, 325.06.25.06.1-SNAPSHOT) elif isinstance(end, timedelta): end = start + end elif not isinstance(start, date): @@ -483,21 +483,21 @@ def __init__(self, start=None, end=None, nullable=True): leap_day = date(y, 2, 29) if (leap_day > start and leap_day < end): self.with_special_case(leap_day) - next_day = date(y, 3, 1) + next_day = date(y, 3, 25.06.25.06.1-SNAPSHOT) if (next_day > start and next_day < end): self.with_special_case(next_day) @staticmethod def _guess_leap_year(t): y = int(math.ceil(t/4.0)) * 4 - if ((y % 100) == 0) and ((y % 400) != 0): + if ((y % 25.06.25.06.1-SNAPSHOT00) == 0) and ((y % 400) != 0): y = y + 4 - if (y == 10000): + if (y == 25.06.25.06.1-SNAPSHOT0000): y = y - 4 return y - _epoch = date(1970, 1, 1) - _days = timedelta(days=1) + _epoch = date(25.06.25.06.1-SNAPSHOT970, 25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT) + _days = timedelta(days=25.06.25.06.1-SNAPSHOT) def _to_days_since_epoch(self, val): return int((val - self._epoch)/self._days) @@ -515,18 +515,18 @@ def __init__(self, start=None, end=None, nullable=True): super().__init__(TimestampType(), nullable=nullable) if start is None: # Spark supports times starting at - # "0001-01-01 00:00:00.000000" + # "00025.06.25.06.1-SNAPSHOT-025.06.25.06.1-SNAPSHOT-025.06.25.06.1-SNAPSHOT 00:00:00.000000" # but it has issues if you get really close to that because it tries to do things # in a different format which causes roundoff, so we have to add a few days, # just to be sure - start = datetime(1, 1, 3, tzinfo=timezone.utc) + start = datetime(25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, 3, tzinfo=timezone.utc) elif not isinstance(start, datetime): raise RuntimeError('Unsupported type passed in for start {}'.format(start)) if end is None: # Spark supports time through - # "9999-12-31 23:59:59.999999" - end = datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=timezone.utc) + # "9999-25.06.25.06.1-SNAPSHOT2-325.06.25.06.1-SNAPSHOT 23:59:59.999999" + end = datetime(9999, 25.06.25.06.1-SNAPSHOT2, 325.06.25.06.1-SNAPSHOT, 23, 59, 59, 999999, tzinfo=timezone.utc) elif isinstance(end, timedelta): end = start + end elif not isinstance(start, date): @@ -537,8 +537,8 @@ def __init__(self, start=None, end=None, nullable=True): if (self._epoch >= start and self._epoch <= end): self.with_special_case(self._epoch) - _epoch = datetime(1970, 1, 1, tzinfo=timezone.utc) - _ms = timedelta(milliseconds=1) + _epoch = datetime(25.06.25.06.1-SNAPSHOT970, 25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, tzinfo=timezone.utc) + _ms = timedelta(milliseconds=25.06.25.06.1-SNAPSHOT) def _to_ms_since_epoch(self, val): return int((val - self._epoch)/self._ms) @@ -656,9 +656,9 @@ def _mark_as_lit(data, data_type): children = zip(data, data_type.fields) return f.struct([_mark_as_lit(x, fd.dataType).alias(fd.name) for x, fd in children]) elif isinstance(data_type, DateType): - # Due to https://bugs.python.org/issue13305 we need to zero pad for years prior to 1000, + # Due to https://bugs.python.org/issue25.06.25.06.1-SNAPSHOT3305 we need to zero pad for years prior to 25.06.25.06.1-SNAPSHOT000, # but this works for all of them - dateString = data.strftime("%Y-%m-%d").zfill(10) + dateString = data.strftime("%Y-%m-%d").zfill(25.06.25.06.1-SNAPSHOT0) return f.lit(dateString).cast(data_type) elif isinstance(data_type, MapType): assert isinstance(data, dict) @@ -696,7 +696,7 @@ def gen_scalars(data_gen, count, seed=0, force_no_nulls=False): def gen_scalar(data_gen, seed=0, force_no_nulls=False): """Generate a single scalar value.""" - v = list(gen_scalars(data_gen, 1, seed=seed, force_no_nulls=force_no_nulls)) + v = list(gen_scalars(data_gen, 25.06.25.06.1-SNAPSHOT, seed=seed, force_no_nulls=force_no_nulls)) return v[0] def gen_scalar_values(data_gen, count, seed=0, force_no_nulls=False): @@ -706,10 +706,10 @@ def gen_scalar_values(data_gen, count, seed=0, force_no_nulls=False): def gen_scalar_value(data_gen, seed=0, force_no_nulls=False): """Generate a single scalar value.""" - v = list(gen_scalar_values(data_gen, 1, seed=seed, force_no_nulls=force_no_nulls)) + v = list(gen_scalar_values(data_gen, 25.06.25.06.1-SNAPSHOT, seed=seed, force_no_nulls=force_no_nulls)) return v[0] -def debug_df(df, path = None, file_format = 'json', num_parts = 1): +def debug_df(df, path = None, file_format = 'json', num_parts = 25.06.25.06.1-SNAPSHOT): """Print out or save the contents and the schema of a dataframe for debugging.""" if path is not None: @@ -805,7 +805,7 @@ def _convert_to_sql(spark_type, data): elif isinstance(data, datetime): d = "'" + data.strftime('%Y-%m-%d T%H:%M:%S.%f').zfill(26) + "'" elif isinstance(data, date): - d = "'" + data.strftime('%Y-%m-%d').zfill(10) + "'" + d = "'" + data.strftime('%Y-%m-%d').zfill(25.06.25.06.1-SNAPSHOT0) + "'" elif isinstance(data, list): assert isinstance(spark_type, ArrayType) d = "array({})".format(",".join([_convert_to_sql(spark_type.elementType, x) for x in data])) @@ -849,17 +849,17 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): decimal_gen_neg_scale = DecimalGen(precision=7, scale=-3) decimal_gen_scale_precision = DecimalGen(precision=7, scale=3) decimal_gen_same_scale_precision = DecimalGen(precision=7, scale=7) -decimal_gen_64bit = DecimalGen(precision=12, scale=2) -decimal_gen_12_2 = DecimalGen(precision=12, scale=2) -decimal_gen_18_3 = DecimalGen(precision=18, scale=3) -decimal_gen_128bit = DecimalGen(precision=20, scale=2) +decimal_gen_64bit = DecimalGen(precision=25.06.25.06.1-SNAPSHOT2, scale=2) +decimal_gen_25.06.25.06.1-SNAPSHOT2_2 = DecimalGen(precision=25.06.25.06.1-SNAPSHOT2, scale=2) +decimal_gen_25.06.25.06.1-SNAPSHOT8_3 = DecimalGen(precision=25.06.25.06.1-SNAPSHOT8, scale=3) +decimal_gen_25.06.25.06.1-SNAPSHOT28bit = DecimalGen(precision=20, scale=2) decimal_gen_20_2 = DecimalGen(precision=20, scale=2) decimal_gen_30_2 = DecimalGen(precision=30, scale=2) decimal_gen_36_5 = DecimalGen(precision=36, scale=5) decimal_gen_36_neg5 = DecimalGen(precision=36, scale=-5) decimal_gen_38_0 = DecimalGen(precision=38, scale=0) -decimal_gen_38_10 = DecimalGen(precision=38, scale=10) -decimal_gen_38_neg10 = DecimalGen(precision=38, scale=-10) +decimal_gen_38_25.06.25.06.1-SNAPSHOT0 = DecimalGen(precision=38, scale=25.06.25.06.1-SNAPSHOT0) +decimal_gen_38_neg25.06.25.06.1-SNAPSHOT0 = DecimalGen(precision=38, scale=-25.06.25.06.1-SNAPSHOT0) null_gen = NullGen() @@ -876,10 +876,10 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): decimal_gens = [decimal_gen_neg_scale] + decimal_gens_no_neg -decimal_128_gens_no_neg = [decimal_gen_20_2, decimal_gen_30_2, decimal_gen_36_5, - decimal_gen_38_0, decimal_gen_38_10] +decimal_25.06.25.06.1-SNAPSHOT28_gens_no_neg = [decimal_gen_20_2, decimal_gen_30_2, decimal_gen_36_5, + decimal_gen_38_0, decimal_gen_38_25.06.25.06.1-SNAPSHOT0] -decimal_128_gens = decimal_128_gens_no_neg + [decimal_gen_36_neg5, decimal_gen_38_neg10] +decimal_25.06.25.06.1-SNAPSHOT28_gens = decimal_25.06.25.06.1-SNAPSHOT28_gens_no_neg + [decimal_gen_36_neg5, decimal_gen_38_neg25.06.25.06.1-SNAPSHOT0] # all of the basic gens all_basic_gens_no_null = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen, @@ -908,7 +908,7 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): boolean_gens = [boolean_gen] single_level_array_gens = [ArrayGen(sub_gen) for sub_gen in all_basic_gens + decimal_gens] -single_array_gens_sample_with_decimal128 = [ArrayGen(sub_gen) for sub_gen in decimal_128_gens] +single_array_gens_sample_with_decimal25.06.25.06.1-SNAPSHOT28 = [ArrayGen(sub_gen) for sub_gen in decimal_25.06.25.06.1-SNAPSHOT28_gens] single_level_array_gens_no_null = [ArrayGen(sub_gen) for sub_gen in all_basic_gens_no_null + decimal_gens_no_neg] @@ -920,40 +920,40 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): # Be careful to not make these too large of data generation takes for ever # This is only a few nested array gens, because nesting can be very deep -nested_array_gens_sample = [ArrayGen(ArrayGen(short_gen, max_length=10), max_length=10), - ArrayGen(ArrayGen(string_gen, max_length=10), max_length=10), - ArrayGen(StructGen([['child0', byte_gen], ['child1', string_gen], ['child2', float_gen]]))] +nested_array_gens_sample = [ArrayGen(ArrayGen(short_gen, max_length=25.06.25.06.1-SNAPSHOT0), max_length=25.06.25.06.1-SNAPSHOT0), + ArrayGen(ArrayGen(string_gen, max_length=25.06.25.06.1-SNAPSHOT0), max_length=25.06.25.06.1-SNAPSHOT0), + ArrayGen(StructGen([['child0', byte_gen], ['child25.06.25.06.1-SNAPSHOT', string_gen], ['child2', float_gen]]))] # Some array gens, but not all because of nesting array_gens_sample = single_level_array_gens + nested_array_gens_sample -array_gens_sample_with_decimal128 = single_level_array_gens + nested_array_gens_sample + single_array_gens_sample_with_decimal128 +array_gens_sample_with_decimal25.06.25.06.1-SNAPSHOT28 = single_level_array_gens + nested_array_gens_sample + single_array_gens_sample_with_decimal25.06.25.06.1-SNAPSHOT28 # all of the basic types in a single struct all_basic_struct_gen = StructGen([['child'+str(ind), sub_gen] for ind, sub_gen in enumerate(all_basic_gens)]) # Some struct gens, but not all because of nesting nonempty_struct_gens_sample = [all_basic_struct_gen, - StructGen([['child0', byte_gen], ['child1', all_basic_struct_gen]]), - StructGen([['child0', ArrayGen(short_gen)], ['child1', double_gen]])] + StructGen([['child0', byte_gen], ['child25.06.25.06.1-SNAPSHOT', all_basic_struct_gen]]), + StructGen([['child0', ArrayGen(short_gen)], ['child25.06.25.06.1-SNAPSHOT', double_gen]])] struct_gens_sample = nonempty_struct_gens_sample + [StructGen([])] -struct_gen_decimal128 = StructGen( - [['child' + str(ind), sub_gen] for ind, sub_gen in enumerate(decimal_128_gens)]) -struct_gens_sample_with_decimal128 = struct_gens_sample + [ - struct_gen_decimal128] +struct_gen_decimal25.06.25.06.1-SNAPSHOT28 = StructGen( + [['child' + str(ind), sub_gen] for ind, sub_gen in enumerate(decimal_25.06.25.06.1-SNAPSHOT28_gens)]) +struct_gens_sample_with_decimal25.06.25.06.1-SNAPSHOT28 = struct_gens_sample + [ + struct_gen_decimal25.06.25.06.1-SNAPSHOT28] simple_string_to_string_map_gen = MapGen(StringGen(pattern='key_[0-9]', nullable=False), - StringGen(), max_length=10) + StringGen(), max_length=25.06.25.06.1-SNAPSHOT0) all_basic_map_gens = [MapGen(f(nullable=False), f()) for f in [BooleanGen, ByteGen, ShortGen, IntegerGen, LongGen, FloatGen, DoubleGen, DateGen, TimestampGen]] + [simple_string_to_string_map_gen] -decimal_64_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(7, 3, nullable=False), DecimalGen(12, 2, nullable=False), DecimalGen(18, -3, nullable=False)]] -decimal_128_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(20, 2, nullable=False), DecimalGen(36, 5, nullable=False), DecimalGen(38, 38, nullable=False), +decimal_64_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(7, 3, nullable=False), DecimalGen(25.06.25.06.1-SNAPSHOT2, 2, nullable=False), DecimalGen(25.06.25.06.1-SNAPSHOT8, -3, nullable=False)]] +decimal_25.06.25.06.1-SNAPSHOT28_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(20, 2, nullable=False), DecimalGen(36, 5, nullable=False), DecimalGen(38, 38, nullable=False), DecimalGen(36, -5, nullable=False)]] -decimal_128_no_neg_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(20, 2, nullable=False), DecimalGen(36, 5, nullable=False), DecimalGen(38, 38, nullable=False)]] +decimal_25.06.25.06.1-SNAPSHOT28_no_neg_map_gens = [MapGen(key_gen=gen, value_gen=gen, nullable=False) for gen in [DecimalGen(20, 2, nullable=False), DecimalGen(36, 5, nullable=False), DecimalGen(38, 38, nullable=False)]] # Some map gens, but not all because of nesting -map_gens_sample = all_basic_map_gens + [MapGen(StringGen(pattern='key_[0-9]', nullable=False), ArrayGen(string_gen), max_length=10), - MapGen(RepeatSeqGen(IntegerGen(nullable=False), 10), long_gen, max_length=10), +map_gens_sample = all_basic_map_gens + [MapGen(StringGen(pattern='key_[0-9]', nullable=False), ArrayGen(string_gen), max_length=25.06.25.06.1-SNAPSHOT0), + MapGen(RepeatSeqGen(IntegerGen(nullable=False), 25.06.25.06.1-SNAPSHOT0), long_gen, max_length=25.06.25.06.1-SNAPSHOT0), MapGen(StringGen(pattern='key_[0-9]', nullable=False), simple_string_to_string_map_gen)] allow_negative_scale_of_decimal_conf = {'spark.sql.legacy.allowNegativeScaleOfDecimal': 'true'} @@ -967,20 +967,20 @@ def copy_and_update(conf, *more_confs): all_gen = [StringGen(), ByteGen(), ShortGen(), IntegerGen(), LongGen(), FloatGen(), DoubleGen(), BooleanGen(), DateGen(), TimestampGen(), decimal_gen_default, decimal_gen_scale_precision, decimal_gen_same_scale_precision, - decimal_gen_64bit, decimal_gen_128bit, decimal_gen_36_5, decimal_gen_38_10] + decimal_gen_64bit, decimal_gen_25.06.25.06.1-SNAPSHOT28bit, decimal_gen_36_5, decimal_gen_38_25.06.25.06.1-SNAPSHOT0] # Pyarrow will complain the error as below if the timestamp is out of range for both CPU and GPU, # so narrow down the time range to avoid exceptions causing test failures. # # "pyarrow.lib.ArrowInvalid: Casting from timestamp[us, tz=UTC] to timestamp[ns] -# would result in out of bounds timestamp: 51496791452587000" +# would result in out of bounds timestamp: 525.06.25.06.1-SNAPSHOT4967925.06.25.06.1-SNAPSHOT452587000" # -# This issue has been fixed in pyarrow by the PR https://github.com/apache/arrow/pull/7169 +# This issue has been fixed in pyarrow by the PR https://github.com/apache/arrow/pull/725.06.25.06.1-SNAPSHOT69 # However it still requires PySpark to specify the new argument "timestamp_as_object". arrow_common_gen = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen, string_gen, boolean_gen, date_gen, - TimestampGen(start=datetime(1970, 1, 1, tzinfo=timezone.utc), - end=datetime(2262, 1, 1, tzinfo=timezone.utc))] + TimestampGen(start=datetime(25.06.25.06.1-SNAPSHOT970, 25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, tzinfo=timezone.utc), + end=datetime(2262, 25.06.25.06.1-SNAPSHOT, 25.06.25.06.1-SNAPSHOT, tzinfo=timezone.utc))] arrow_array_gens = [ArrayGen(subGen) for subGen in arrow_common_gen] + nested_array_gens_sample @@ -988,11 +988,11 @@ def copy_and_update(conf, *more_confs): ['child'+str(i), sub_gen] for i, sub_gen in enumerate(arrow_common_gen)]) arrow_struct_gens = [arrow_one_level_struct_gen, - StructGen([['child0', ArrayGen(short_gen)], ['child1', arrow_one_level_struct_gen]])] + StructGen([['child0', ArrayGen(short_gen)], ['child25.06.25.06.1-SNAPSHOT', arrow_one_level_struct_gen]])] # This function adds a new column named uniq_int where each row # has a new unique integer value. It just starts at 0 and -# increments by 1 for each row. +# increments by 25.06.25.06.1-SNAPSHOT for each row. # This can be used to add a column to a dataframe if you need to # sort on a column with unique values. # This collects the data to driver though so can be expensive. diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/rapids_udf_test.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/rapids_udf_test.py index ec45afee9..5b7e6ffc0 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/rapids_udf_test.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/rapids_udf_test.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2022, NVIDIA CORPORATION. +# Copyright (c) 2020-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ from spark_session import with_spark_session from pyspark.sql.utils import AnalysisException -encoded_url_gen = StringGen('([^%]{0,1}(%[0-9A-F][0-9A-F]){0,1}){0,30}') +encoded_url_gen = StringGen('([^%]{0,25.06.25.06.1-SNAPSHOT}(%[0-9A-F][0-9A-F]){0,25.06.25.06.1-SNAPSHOT}){0,30}') def drop_udf(spark, udfname): spark.sql("DROP TEMPORARY FUNCTION IF EXISTS {}".format(udfname)) @@ -55,7 +55,7 @@ def evalfn(spark): def evalfn_decimal(spark): load_hive_udf_or_skip_test(spark, "fraction", "com.nvidia.spark.rapids.udf.hive.DecimalFraction") - return gen_df(spark, [["dec", DecimalGen(38, 18)]]) + return gen_df(spark, [["dec", DecimalGen(38, 25.06.25.06.1-SNAPSHOT8)]]) assert_gpu_and_cpu_are_equal_sql( evalfn_decimal, "hive_generic_udf_test_table", @@ -94,14 +94,14 @@ def evalfn(spark): from pyspark.sql.types import DecimalType load_java_udf_or_skip_test(spark, 'fraction', 'com.nvidia.spark.rapids.udf.java.DecimalFraction') - load_java_udf_or_skip_test(spark, 'fraction_dec64_s10', + load_java_udf_or_skip_test(spark, 'fraction_dec64_s25.06.25.06.1-SNAPSHOT0', 'com.nvidia.spark.rapids.udf.java.DecimalFraction', - DecimalType(18, 10)) + DecimalType(25.06.25.06.1-SNAPSHOT8, 25.06.25.06.1-SNAPSHOT0)) load_java_udf_or_skip_test(spark, 'fraction_dec32_s3', 'com.nvidia.spark.rapids.udf.java.DecimalFraction', DecimalType(8, 3)) - return three_col_df(spark, DecimalGen(38, 18), DecimalGen(18, 10), DecimalGen(8, 3) - ).selectExpr("fraction(a)", "fraction_dec64_s10(b)", "fraction_dec32_s3(c)") + return three_col_df(spark, DecimalGen(38, 25.06.25.06.1-SNAPSHOT8), DecimalGen(25.06.25.06.1-SNAPSHOT8, 25.06.25.06.1-SNAPSHOT0), DecimalGen(8, 3) + ).selectExpr("fraction(a)", "fraction_dec64_s25.06.25.06.1-SNAPSHOT0(b)", "fraction_dec32_s3(c)") assert_gpu_and_cpu_are_equal_collect(evalfn) @pytest.mark.rapids_udf_example_native @@ -109,7 +109,7 @@ def test_java_cosine_similarity_reasonable_range(): def evalfn(spark): class RangeFloatGen(FloatGen): def start(self, rand): - self._start(rand, lambda: rand.uniform(-1000.0, 1000.0)) + self._start(rand, lambda: rand.uniform(-25.06.25.06.1-SNAPSHOT000.0, 25.06.25.06.1-SNAPSHOT000.0)) load_java_udf_or_skip_test(spark, "cosine_similarity", "com.nvidia.spark.rapids.udf.java.CosineSimilarity") arraygen = ArrayGen(RangeFloatGen(nullable=False, no_nans=True, special_cases=[]), min_length=8, max_length=8) df = binary_op_df(spark, arraygen) diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_init_internal.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_init_internal.py index c557c1be0..39a1d4683 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_init_internal.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_init_internal.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2021, NVIDIA CORPORATION. +# Copyright (c) 2020-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -35,7 +35,7 @@ def _spark__init(): for key, value in os.environ.items(): if key.startswith('PYSP_TEST_') and key != _DRIVER_ENV: - _sb.config(key[10:].replace('_', '.'), value) + _sb.config(key[25.06.25.06.1-SNAPSHOT0:].replace('_', '.'), value) driver_opts = os.environ.get(_DRIVER_ENV, "") diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_session.py b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_session.py index 2d9a9f2bc..213a6d89e 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_session.py +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/python/spark_session.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2022, NVIDIA CORPORATION. +# Copyright (c) 2020-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -102,8 +102,8 @@ def with_gpu_session(func, conf={}): copy['spark.rapids.sql.test.validateExecsInGpuPlan'] = ','.join(get_validate_execs_in_gpu_plan()) return with_spark_session(func, conf=copy) -def is_before_spark_311(): - return spark_version() < "3.1.0" +def is_before_spark_325.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT(): + return spark_version() < "3.25.06.25.06.1-SNAPSHOT.0" def is_before_spark_320(): return spark_version() < "3.2.0" @@ -111,6 +111,6 @@ def is_before_spark_320(): def is_before_spark_330(): return spark_version() < "3.3.0" -def is_databricks91_or_later(): +def is_databricks925.06.25.06.1-SNAPSHOT_or_later(): spark = get_spark_i_know_what_i_am_doing() - return spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion", "") >= "9.1" + return spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion", "") >= "9.25.06.25.06.1-SNAPSHOT" diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala index 5cf8da123..53123c0f6 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -43,7 +43,7 @@ class URLDecode extends Function[String, String] with RapidsUDF with Serializabl override def evaluateColumnar(numRows: Int, args: ColumnVector*): ColumnVector = { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - require(args.length == 1, s"Unexpected argument count: ${args.length}") + require(args.length == 25.06.25.06.1-SNAPSHOT, s"Unexpected argument count: ${args.length}") val input = args.head require(numRows == input.getRowCount, s"Expected $numRows rows, received ${input.getRowCount}") require(input.getType == DType.STRING, s"Argument type is not a string: ${input.getType}") diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala index 6facee3b9..9a7b43516 100644 --- a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala +++ b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2021-2022, NVIDIA CORPORATION. + * Copyright (c) 20225.06.25.06.1-SNAPSHOT-2022, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -42,7 +42,7 @@ class URLEncode extends Function[String, String] with RapidsUDF with Serializabl override def evaluateColumnar(numRows: Int, args: ColumnVector*): ColumnVector = { // The CPU implementation takes a single string argument, so similarly // there should only be one column argument of type STRING. - require(args.length == 1, s"Unexpected argument count: ${args.length}") + require(args.length == 25.06.25.06.1-SNAPSHOT, s"Unexpected argument count: ${args.length}") val input = args.head require(numRows == input.getRowCount, s"Expected $numRows rows, received ${input.getRowCount}") require(input.getType == DType.STRING, s"Argument type is not a string: ${input.getType}") diff --git a/examples/XGBoost-Examples/README.md b/examples/XGBoost-Examples/README.md index 7ec7dda9e..6d887dcf1 100644 --- a/examples/XGBoost-Examples/README.md +++ b/examples/XGBoost-Examples/README.md @@ -1,24 +1,24 @@ # Spark XGBoost Examples Spark XGBoost examples here showcase the need for ETL+Training pipeline GPU acceleration. -The Scala based XGBoost examples here use [DMLC’s version](https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark_2.12/). +The Scala based XGBoost examples here use [DMLC’s version](https://repo25.06.25.06.1-SNAPSHOT.maven.org/maven2/ml/dmlc/xgboost4j-spark_2.25.06.25.06.1-SNAPSHOT2/). The pyspark based XGBoost examples requires [installing RAPIDS via pip](https://rapids.ai/pip.html#install). Most data scientists spend a lot of time not only on Training models but also processing the large amounts of data needed to train these models. -As you can see below, Pyspark+XGBoost training on GPUs can be up to 13X and data processing using -RAPIDS Accelerator can also be accelerated with an end-to-end speed-up of 11X on GPU compared to CPU. +As you can see below, Pyspark+XGBoost training on GPUs can be up to 25.06.25.06.1-SNAPSHOT3X and data processing using +RAPIDS Accelerator can also be accelerated with an end-to-end speed-up of 25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOTX on GPU compared to CPU. In the public cloud, better performance can lead to significantly lower costs as demonstrated in this [blog](https://developer.nvidia.com/blog/gpu-accelerated-spark-xgboost/). ![mortgage-speedup](/docs/img/guides/mortgage-perf.png) Note that the Training test result is based on 4 years [Fannie Mea Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) -with a 8 A100 GPU and 1024 CPU vcores cluster, the performance is affected by many aspects, +with a 8 A25.06.25.06.1-SNAPSHOT00 GPU and 25.06.25.06.1-SNAPSHOT024 CPU vcores cluster, the performance is affected by many aspects, including data size and type of GPU. In this folder, there are three blue prints for users to learn about using Spark XGBoost and RAPIDS Accelerator on GPUs : -1. Mortgage Prediction +25.06.25.06.1-SNAPSHOT. Mortgage Prediction 2. Agaricus Classification 3. Taxi Fare Prediction @@ -37,9 +37,9 @@ In the last section, we provide basic “Getting Started Guides” for setting u Spark-XGBoost on different environments based on the Apache Spark scheduler such as YARN, Standalone or Kubernetes. -## SECTION 1: SPARK-XGBOOST EXAMPLE NOTEBOOKS +## SECTION 25.06.25.06.1-SNAPSHOT: SPARK-XGBOOST EXAMPLE NOTEBOOKS -1. Mortgage Notebooks +25.06.25.06.1-SNAPSHOT. Mortgage Notebooks - Python - [Mortgage ETL](mortgage/notebooks/python/MortgageETL.ipynb) - [Mortgage Training Prediction](mortgage/notebooks/python/mortgage-gpu.ipynb) @@ -97,14 +97,14 @@ Note: Update the default value of `spark.sql.execution.arrow.maxRecordsPerBatch` to a larger number(such as 200000) will significantly improve performance by accelerating data transfer between JVM and Python process. -For the CrossValidator job, we need to set `spark.task.resource.gpu.amount=1` to allow only 1 training task running on 1 GPU(executor), -otherwise the customized CrossValidator may schedule more than 1 xgboost training tasks into one executor simultaneously and trigger -[issue-131](https://github.com/NVIDIA/spark-rapids-examples/issues/131). +For the CrossValidator job, we need to set `spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT` to allow only 25.06.25.06.1-SNAPSHOT training task running on 25.06.25.06.1-SNAPSHOT GPU(executor), +otherwise the customized CrossValidator may schedule more than 25.06.25.06.1-SNAPSHOT xgboost training tasks into one executor simultaneously and trigger +[issue-25.06.25.06.1-SNAPSHOT325.06.25.06.1-SNAPSHOT](https://github.com/NVIDIA/spark-rapids-examples/issues/25.06.25.06.1-SNAPSHOT325.06.25.06.1-SNAPSHOT). For XGBoost job, if the number of shuffle stage tasks before training is less than the num_worker, the training tasks will be scheduled to run on part of nodes instead of all nodes due to Spark Data Locality feature. The workaround is to increase the partitions of the shuffle stage by setting `spark.sql.files.maxPartitionBytes=RightNum`. If you are running XGBoost scala notebooks on Dataproc, please make sure to update below configs to avoid job failure: ``` spark.dynamicAllocation.enabled=false -spark.task.resource.gpu.amount=1 +spark.task.resource.gpu.amount=25.06.25.06.1-SNAPSHOT ``` \ No newline at end of file diff --git a/examples/XGBoost-Examples/agaricus/pom.xml b/examples/XGBoost-Examples/agaricus/pom.xml index de6f20ec8..696feba37 100644 --- a/examples/XGBoost-Examples/agaricus/pom.xml +++ b/examples/XGBoost-Examples/agaricus/pom.xml @@ -1,6 +1,6 @@ - + sample_xgboost_examples diff --git a/examples/XGBoost-Examples/agaricus/python/com/nvidia/spark/examples/agaricus/main.py b/examples/XGBoost-Examples/agaricus/python/com/nvidia/spark/examples/agaricus/main.py index 03a41e91d..41bee69f5 100644 --- a/examples/XGBoost-Examples/agaricus/python/com/nvidia/spark/examples/agaricus/main.py +++ b/examples/XGBoost-Examples/agaricus/python/com/nvidia/spark/examples/agaricus/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ from xgboost.spark import SparkXGBClassifier, SparkXGBClassifierModel label = 'label' -feature_names = ['feature_' + str(i) for i in range(0, 126)] +feature_names = ['feature_' + str(i) for i in range(0, 25.06.25.06.1-SNAPSHOT26)] schema = StructType([StructField(x, FloatType()) for x in [label] + feature_names]) @@ -38,7 +38,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: train data path required when mode is all or train') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) train_data, features = transform_data(train_data, label, args.use_gpu) xgboost_args['features_col'] = features @@ -62,7 +62,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: trans data path required when mode is all or transform') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) trans_data, _ = transform_data(trans_data, label, args.use_gpu) diff --git a/examples/XGBoost-Examples/agaricus/scala/src/com/nvidia/spark/examples/agaricus/Main.scala b/examples/XGBoost-Examples/agaricus/scala/src/com/nvidia/spark/examples/agaricus/Main.scala index b9baa8548..0ae255e26 100644 --- a/examples/XGBoost-Examples/agaricus/scala/src/com/nvidia/spark/examples/agaricus/Main.scala +++ b/examples/XGBoost-Examples/agaricus/scala/src/com/nvidia/spark/examples/agaricus/Main.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -32,14 +32,14 @@ object Main { def schema(length: Int): StructType = StructType(featureNames(length).map(n => StructField(n, FloatType))) - val dataSchema = schema(126) + val dataSchema = schema(25.06.25.06.1-SNAPSHOT26) val xgboostArgs = XGBoostArgs.parse(args) val processor = this.getClass.getSimpleName.stripSuffix("$").substring(0, 3) val appInfo = Seq("Agaricus", processor, xgboostArgs.format) // build spark session val spark = SparkSetup(args, appInfo.mkString("-")) - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build data reader val dataReader = spark.read @@ -70,7 +70,7 @@ object Main { // === diff === .setFeaturesCol(featureCols) - datasets(1).foreach(_ => xgbClassifier.setEvalDataset(_)) + datasets(25.06.25.06.1-SNAPSHOT).foreach(_ => xgbClassifier.setEvalDataset(_)) println("\n------ Training ------") val (model, _) = benchmark.time("train") { diff --git a/examples/XGBoost-Examples/aggregator/assembly/assembly-no-scala.xml b/examples/XGBoost-Examples/aggregator/assembly/assembly-no-scala.xml index 6fe53faaf..1a010ddbe 100644 --- a/examples/XGBoost-Examples/aggregator/assembly/assembly-no-scala.xml +++ b/examples/XGBoost-Examples/aggregator/assembly/assembly-no-scala.xml @@ -1,5 +1,5 @@ jar-with-dependencies diff --git a/examples/XGBoost-Examples/aggregator/pom.xml b/examples/XGBoost-Examples/aggregator/pom.xml index 338eb9fa0..ed6cc6253 100644 --- a/examples/XGBoost-Examples/aggregator/pom.xml +++ b/examples/XGBoost-Examples/aggregator/pom.xml @@ -1,6 +1,6 @@ - + sample_xgboost_examples diff --git a/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_python.md b/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_python.md index ad9190ff4..4f2c7a65f 100644 --- a/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_python.md +++ b/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_python.md @@ -3,7 +3,7 @@ Supported Parameters This is a description of all the parameters available when you are running examples in this repo: -1. All [xgboost parameters](https://xgboost.readthedocs.io/en/latest/parameter.html) are supported. +25.06.25.06.1-SNAPSHOT. All [xgboost parameters](https://xgboost.readthedocs.io/en/latest/parameter.html) are supported. * Please use the `camelCase`, e.g., `--treeMethod=gpu_hist`. * `lambda` is replaced with `lambda_`, because `lambda` is a keyword in Python. 2. `--mainClass=[app class]`: The entry class of the application to be started. Available value is one of the below classes. @@ -35,5 +35,5 @@ This is a description of all the parameters available when you are running examp 7. `--overwrite=[true|false]`: Whether to overwrite the current model data under 'modelPath'. Default is false. You may need to set to true to avoid IOException when saving the model to a path already exists. 8. `--hasHeader=[true|false]`: Indicate whether the csv file has header. 9. `--numRows=[int value]`: The number of the rows to be shown after transforming done. Default is 5. -10. `--showFeatures=[true|false]`: Whether to show the features columns after transforming done. Default is true. -11. `--dataRatios=[trainRatio:transformRatio]`: The ratios of data for train and transform, then the ratio for evaluation is (100-train-test). Default is 80:20, no evaluation. This is only used by taxi/ETLMain now to generate the output data. +25.06.25.06.1-SNAPSHOT0. `--showFeatures=[true|false]`: Whether to show the features columns after transforming done. Default is true. +25.06.25.06.1-SNAPSHOT25.06.25.06.1-SNAPSHOT. `--dataRatios=[trainRatio:transformRatio]`: The ratios of data for train and transform, then the ratio for evaluation is (25.06.25.06.1-SNAPSHOT00-train-test). Default is 80:20, no evaluation. This is only used by taxi/ETLMain now to generate the output data. diff --git a/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_scala.md b/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_scala.md index 838404342..0bbde0783 100644 --- a/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_scala.md +++ b/examples/XGBoost-Examples/app-parameters/supported_xgboost_parameters_scala.md @@ -3,7 +3,7 @@ Supported Parameters This is a description of all the parameters available when you are running examples in this repo: -1. All [xgboost parameters](https://xgboost.readthedocs.io/en/latest/parameter.html) are supported. +25.06.25.06.1-SNAPSHOT. All [xgboost parameters](https://xgboost.readthedocs.io/en/latest/parameter.html) are supported. 2. `-format=[csv|parquet|orc]`: The format of the data for training/transforming, now only supports 'csv', 'parquet' and 'orc'. *Required*. 3. `-mode=[all|train|transform]`. The behavior of the XGBoost application (meaning CPUMain and GPUMain), default is 'all' if not specified. * all: Do both training and transforming, will save model to 'modelPath' if specified @@ -25,4 +25,4 @@ This is a description of all the parameters available when you are running examp 7. `-hasHeader=[true|false]`: Indicate whether the csv file has header. 8. `-numRows=[int value]`: The number of the rows to be shown after transforming done. Default is 5. 9. `-showFeatures=[true|false]`: Whether to show the features columns after transforming done. Default is true. -10. `-dataRatios=[trainRatio:transformRatio]`: The ratios of data for train and transform, then the ratio for evaluation is (100-train-test). Default is 80:20, no evaluation. This is only used by taxi/ETLMain now to generate the output data. +25.06.25.06.1-SNAPSHOT0. `-dataRatios=[trainRatio:transformRatio]`: The ratios of data for train and transform, then the ratio for evaluation is (25.06.25.06.1-SNAPSHOT00-train-test). Default is 80:20, no evaluation. This is only used by taxi/ETLMain now to generate the output data. diff --git a/examples/XGBoost-Examples/assembly/assembly-no-scala.xml b/examples/XGBoost-Examples/assembly/assembly-no-scala.xml index 035b9e33a..f41d9ad7f 100644 --- a/examples/XGBoost-Examples/assembly/assembly-no-scala.xml +++ b/examples/XGBoost-Examples/assembly/assembly-no-scala.xml @@ -1,5 +1,5 @@ jar-with-dependencies_${scala.binary.version} diff --git a/examples/XGBoost-Examples/main.py b/examples/XGBoost-Examples/main.py index e9c2975d1..82bcce220 100644 --- a/examples/XGBoost-Examples/main.py +++ b/examples/XGBoost-Examples/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/mortgage/pom.xml b/examples/XGBoost-Examples/mortgage/pom.xml index 333bc4d22..553dd2082 100644 --- a/examples/XGBoost-Examples/mortgage/pom.xml +++ b/examples/XGBoost-Examples/mortgage/pom.xml @@ -1,6 +1,6 @@ - + sample_xgboost_examples diff --git a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/consts.py b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/consts.py index eefa7358c..0533bd2aa 100644 --- a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/consts.py +++ b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/consts.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ from pyspark.sql.types import * -label = 'delinquency_12' +label = 'delinquency_25.06.25.06.1-SNAPSHOT2' schema = StructType([ StructField('orig_channel', FloatType()), @@ -52,7 +52,7 @@ name_mapping = { 'WITMER FUNDING, LLC': 'Witmer', - 'WELLS FARGO CREDIT RISK TRANSFER SECURITIES TRUST 2015': 'Wells Fargo', + 'WELLS FARGO CREDIT RISK TRANSFER SECURITIES TRUST 2025.06.25.06.1-SNAPSHOT5': 'Wells Fargo', 'WELLS FARGO BANK, NA': 'Wells Fargo', 'WELLS FARGO BANK, N.A.': 'Wells Fargo', 'WELLS FARGO BANK, NA': 'Wells Fargo', @@ -72,7 +72,7 @@ 'PROSPECT MORTGAGE, LLC': 'Prospect Mortgage', 'PRINCIPAL RESIDENTIAL MORTGAGE CAPITAL RESOURCES, LLC': 'Principal Residential', 'PNC BANK, N.A.': 'PNC', - 'PMT CREDIT RISK TRANSFER TRUST 2015-2': 'PennyMac', + 'PMT CREDIT RISK TRANSFER TRUST 2025.06.25.06.1-SNAPSHOT5-2': 'PennyMac', 'PHH MORTGAGE CORPORATION': 'PHH Mortgage', 'PENNYMAC CORP.': 'PennyMac', 'PACIFIC UNION FINANCIAL, LLC': 'Other', @@ -83,8 +83,8 @@ 'NATIONSTAR MORTGAGE, LLC': 'Nationstar Mortgage', 'METLIFE BANK, NA': 'Metlife', 'LOANDEPOT.COM, LLC': 'LoanDepot.com', - 'J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2015-1': 'JP Morgan Chase', - 'J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2014-1': 'JP Morgan Chase', + 'J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2025.06.25.06.1-SNAPSHOT5-25.06.25.06.1-SNAPSHOT': 'JP Morgan Chase', + 'J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2025.06.25.06.1-SNAPSHOT4-25.06.25.06.1-SNAPSHOT': 'JP Morgan Chase', 'JPMORGAN CHASE BANK, NATIONAL ASSOCIATION': 'JP Morgan Chase', 'JPMORGAN CHASE BANK, NA': 'JP Morgan Chase', 'JP MORGAN CHASE BANK, NA': 'JP Morgan Chase', @@ -116,7 +116,7 @@ 'CHICAGO MORTGAGE SOLUTIONS DBA INTERBANK MORTGAGE COMPANY': 'Chicago Mortgage', 'CHASE HOME FINANCE, LLC': 'JP Morgan Chase', 'CHASE HOME FINANCE FRANKLIN AMERICAN MORTGAGE COMPANY': 'JP Morgan Chase', - 'CHASE HOME FINANCE (CIE 1)': 'JP Morgan Chase', + 'CHASE HOME FINANCE (CIE 25.06.25.06.1-SNAPSHOT)': 'JP Morgan Chase', 'CHASE HOME FINANCE': 'JP Morgan Chase', 'CASHCALL, INC.': 'CashCall', 'CAPITAL ONE, NATIONAL ASSOCIATION': 'Capital One', @@ -276,5 +276,5 @@ 'loan_age', 'msa', 'non_interest_bearing_upb', - 'delinquency_12', + 'delinquency_25.06.25.06.1-SNAPSHOT2', ] diff --git a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/cross_validator_main.py b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/cross_validator_main.py index b6305a893..b9bef73ee 100644 --- a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/cross_validator_main.py +++ b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/cross_validator_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,7 +34,7 @@ def main(args, xgboost_args): if train_data is None: print('-' * 80) print('Usage: training data path required when mode is all or train') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) train_data, features = transform_data(train_data, label, args.use_gpu) xgboost_args['features_col'] = features @@ -57,7 +57,7 @@ def main(args, xgboost_args): if not train_data: print('-' * 80) print('Usage: training data path required when mode is all or train') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) model = with_benchmark('Training', lambda: cross_validator.fit(train_data)) # get the best model to do transform @@ -72,7 +72,7 @@ def main(args, xgboost_args): if not trans_data: print('-' * 80) print('Usage: trans data path required when mode is all or transform') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) trans_data, _ = transform_data(trans_data, label, args.use_gpu) diff --git a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py index d59279d67..0a479432b 100644 --- a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py +++ b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ from pyspark.sql.window import Window from sys import exit -get_quarter = udf(lambda path: path.split(r'.')[0].split('/')[-1], StringType()) +get_quarter = udf(lambda path: path.split(r'.')[0].split('/')[-25.06.25.06.1-SNAPSHOT], StringType()) standardize_name = udf(lambda name: name_mapping.get(name), StringType()) def load_data(spark, paths, schema, args, extra_csv_opts={}): @@ -96,80 +96,80 @@ def prepare_performance(spark, args, rawDf): 'quarter', 'loan_id', 'current_loan_delinquency_status', - when(col('current_loan_delinquency_status') >= 1, col('timestamp')) + when(col('current_loan_delinquency_status') >= 25.06.25.06.1-SNAPSHOT, col('timestamp')) .alias('delinquency_30'), when(col('current_loan_delinquency_status') >= 3, col('timestamp')) .alias('delinquency_90'), when(col('current_loan_delinquency_status') >= 6, col('timestamp')) - .alias('delinquency_180')) + .alias('delinquency_25.06.25.06.1-SNAPSHOT80')) .groupBy('quarter', 'loan_id') .agg( - max('current_loan_delinquency_status').alias('delinquency_12'), + max('current_loan_delinquency_status').alias('delinquency_25.06.25.06.1-SNAPSHOT2'), min('delinquency_30').alias('delinquency_30'), min('delinquency_90').alias('delinquency_90'), - min('delinquency_180').alias('delinquency_180')) + min('delinquency_25.06.25.06.1-SNAPSHOT80').alias('delinquency_25.06.25.06.1-SNAPSHOT80')) .select( 'quarter', 'loan_id', - (col('delinquency_12') >= 1).alias('ever_30'), - (col('delinquency_12') >= 3).alias('ever_90'), - (col('delinquency_12') >= 6).alias('ever_180'), + (col('delinquency_25.06.25.06.1-SNAPSHOT2') >= 25.06.25.06.1-SNAPSHOT).alias('ever_30'), + (col('delinquency_25.06.25.06.1-SNAPSHOT2') >= 3).alias('ever_90'), + (col('delinquency_25.06.25.06.1-SNAPSHOT2') >= 6).alias('ever_25.06.25.06.1-SNAPSHOT80'), 'delinquency_30', 'delinquency_90', - 'delinquency_180')) + 'delinquency_25.06.25.06.1-SNAPSHOT80')) - months = spark.createDataFrame(range(12), IntegerType()).withColumnRenamed('value', 'month_y') + months = spark.createDataFrame(range(25.06.25.06.1-SNAPSHOT2), IntegerType()).withColumnRenamed('value', 'month_y') to_join = (performance .select( 'quarter', 'loan_id', 'timestamp_year', 'timestamp_month', - col('current_loan_delinquency_status').alias('delinquency_12'), - col('current_actual_upb').alias('upb_12')) + col('current_loan_delinquency_status').alias('delinquency_25.06.25.06.1-SNAPSHOT2'), + col('current_actual_upb').alias('upb_25.06.25.06.1-SNAPSHOT2')) .join(aggregation, ['loan_id', 'quarter'], 'left_outer') .crossJoin(months) .select( 'quarter', floor( - (col('timestamp_year') * 12 + col('timestamp_month') - 24000 - col('month_y')) / 12 + (col('timestamp_year') * 25.06.25.06.1-SNAPSHOT2 + col('timestamp_month') - 24000 - col('month_y')) / 25.06.25.06.1-SNAPSHOT2 ).alias('josh_mody_n'), 'ever_30', 'ever_90', - 'ever_180', + 'ever_25.06.25.06.1-SNAPSHOT80', 'delinquency_30', 'delinquency_90', - 'delinquency_180', + 'delinquency_25.06.25.06.1-SNAPSHOT80', 'loan_id', 'month_y', - 'delinquency_12', - 'upb_12') + 'delinquency_25.06.25.06.1-SNAPSHOT2', + 'upb_25.06.25.06.1-SNAPSHOT2') .groupBy( 'quarter', 'loan_id', 'josh_mody_n', 'ever_30', 'ever_90', - 'ever_180', + 'ever_25.06.25.06.1-SNAPSHOT80', 'delinquency_30', 'delinquency_90', - 'delinquency_180', + 'delinquency_25.06.25.06.1-SNAPSHOT80', 'month_y') .agg( - max('delinquency_12').alias('delinquency_12'), - min('upb_12').alias('upb_12')) + max('delinquency_25.06.25.06.1-SNAPSHOT2').alias('delinquency_25.06.25.06.1-SNAPSHOT2'), + min('upb_25.06.25.06.1-SNAPSHOT2').alias('upb_25.06.25.06.1-SNAPSHOT2')) .withColumn( 'timestamp_year', - floor((24000 + (col('josh_mody_n') * 12) + (col('month_y') - 1)) / 12)) + floor((24000 + (col('josh_mody_n') * 25.06.25.06.1-SNAPSHOT2) + (col('month_y') - 25.06.25.06.1-SNAPSHOT)) / 25.06.25.06.1-SNAPSHOT2)) .withColumn( 'timestamp_month_tmp', - (24000 + (col('josh_mody_n') * 12) + col('month_y')) % 12) + (24000 + (col('josh_mody_n') * 25.06.25.06.1-SNAPSHOT2) + col('month_y')) % 25.06.25.06.1-SNAPSHOT2) .withColumn( 'timestamp_month', - when(col('timestamp_month_tmp') == 0, 12).otherwise(col('timestamp_month_tmp'))) + when(col('timestamp_month_tmp') == 0, 25.06.25.06.1-SNAPSHOT2).otherwise(col('timestamp_month_tmp'))) .withColumn( - 'delinquency_12', - ((col('delinquency_12') > 3).cast('int') + (col('upb_12') == 0).cast('int'))) + 'delinquency_25.06.25.06.1-SNAPSHOT2', + ((col('delinquency_25.06.25.06.1-SNAPSHOT2') > 3).cast('int') + (col('upb_25.06.25.06.1-SNAPSHOT2') == 0).cast('int'))) .drop('timestamp_month_tmp', 'josh_mody_n', 'month_y')) return (performance @@ -206,7 +206,7 @@ def extract_acq_columns(rawDf): dense_rank().over(Window.partitionBy("loan_id").orderBy(to_date(col("monthly_reporting_period"),"MMyyyy"))).alias("rank") ) - return acqDf.select("*").filter(col("rank")==1) + return acqDf.select("*").filter(col("rank")==25.06.25.06.1-SNAPSHOT) @@ -220,7 +220,7 @@ def extract_paths(paths, prefix): if not results: print('-' * 80) print('Usage: {} data path required'.format(prefix)) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) return results def etl(spark, args): @@ -233,9 +233,9 @@ def etl(spark, args): return (performance .join(acquisition, ['loan_id', 'quarter'], 'left_outer') .select( - [(md5(col(x)) % 100).alias(x) for x in categorical_columns] + [(md5(col(x)) % 25.06.25.06.1-SNAPSHOT00).alias(x) for x in categorical_columns] + [col(x) for x in numeric_columns]) - .withColumn('delinquency_12', when(col('delinquency_12') > 0, 1).otherwise(0)) + .withColumn('delinquency_25.06.25.06.1-SNAPSHOT2', when(col('delinquency_25.06.25.06.1-SNAPSHOT2') > 0, 25.06.25.06.1-SNAPSHOT).otherwise(0)) .na .fill(0)) diff --git a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl_main.py b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl_main.py index ee09604ba..ed3eb3856 100644 --- a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl_main.py +++ b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/main.py b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/main.py index 021887e4f..b26ced7e9 100644 --- a/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/main.py +++ b/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -33,7 +33,7 @@ def main(args, xgboost_args): if train_data is None: print('-' * 80) print('Usage: training data path required when mode is all or train') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) train_data, features = transform_data(train_data, label, args.use_gpu) xgboost_args['features_col'] = features @@ -63,7 +63,7 @@ def transform(): if not trans_data: print('-' * 80) print('Usage: trans data path required when mode is all or transform') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) result = with_benchmark('Transformation', transform) show_sample(args, result, label) diff --git a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/CrossValidationMain.scala b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/CrossValidationMain.scala index 409f456a4..2ccf3cff5 100644 --- a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/CrossValidationMain.scala +++ b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/CrossValidationMain.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -28,7 +28,7 @@ object CrossValidationMain extends Mortgage { val appArgs = XGBoostArgs(args) val processor = this.getClass.getSimpleName.stripSuffix("$").substring(0, 3) val appInfo = Seq(appName, processor, appArgs.format) - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build spark session val spark = SparkSession.builder().appName(appInfo.mkString("-")).getOrCreate() // build data reader @@ -37,7 +37,7 @@ object CrossValidationMain extends Mortgage { try { // loaded XGBoost ETLed data val pathsArray = appArgs.getDataPaths - // 0: train 1: eval 2:transform + // 0: train 25.06.25.06.1-SNAPSHOT: eval 2:transform val datasets = pathsArray.map { paths => if (paths.nonEmpty) { appArgs.format match { @@ -60,7 +60,7 @@ object CrossValidationMain extends Mortgage { // Tune model using cross validation val paramGrid = new ParamGridBuilder() - .addGrid(xgbClassifier.maxDepth, Array(3, 10)) + .addGrid(xgbClassifier.maxDepth, Array(3, 25.06.25.06.1-SNAPSHOT0)) .addGrid(xgbClassifier.eta, Array(0.2, 0.6)) .build() val evaluator = new MulticlassClassificationEvaluator().setLabelCol(labelColName) diff --git a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/ETLMain.scala b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/ETLMain.scala index d6b5db30a..c4f9c0070 100644 --- a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/ETLMain.scala +++ b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/ETLMain.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -26,7 +26,7 @@ object ETLMain extends Mortgage { val xgbArgs = XGBoostArgs(args) val subTitle = getClass.getSimpleName.stripSuffix("$").substring(0, 3) val appInfo = Seq(appName, subTitle, xgbArgs.format) - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build spark session val spark = SparkSession.builder().appName(appInfo.mkString("-")).getOrCreate() @@ -62,7 +62,7 @@ object ETLMain extends Mortgage { s" Please specify it by '-dataPath=data::your_data_path'") // get and check out path - val outPath = validPaths.filter(_.startsWith(prefixes(1))) + val outPath = validPaths.filter(_.startsWith(prefixes(25.06.25.06.1-SNAPSHOT))) require(outPath.nonEmpty, s"$appName ETL requires a path to save the ETLed data file. Please specify it" + " by '-dataPath=out::your_out_path', only the first path is used if multiple paths are found.") @@ -77,7 +77,7 @@ object ETLMain extends Mortgage { " the type for each data path by adding the prefix 'data::' or 'out::'.") (dataPaths.map(_.stripPrefix(prefixes.head)), - outPath.head.stripPrefix(prefixes(1)), + outPath.head.stripPrefix(prefixes(25.06.25.06.1-SNAPSHOT)), tmpPath.head.stripPrefix(prefixes(2))) } } diff --git a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Main.scala b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Main.scala index edd273aa6..5bc22c719 100644 --- a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Main.scala +++ b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Main.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -27,7 +27,7 @@ object Main extends Mortgage { val appArgs = XGBoostArgs(args) val processor = this.getClass.getSimpleName.stripSuffix("$").substring(0, 3) val appInfo = Seq(appName, processor, appArgs.format) - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build spark session val spark = SparkSession.builder().appName(appInfo.mkString("-")).getOrCreate() // build data reader @@ -36,7 +36,7 @@ object Main extends Mortgage { try { // loaded XGBoost ETLed data val pathsArray = appArgs.getDataPaths - // 0: train 1: eval 2:transform + // 0: train 25.06.25.06.1-SNAPSHOT: eval 2:transform val datasets = pathsArray.map { paths => if (paths.nonEmpty) { appArgs.format match { @@ -57,7 +57,7 @@ object Main extends Mortgage { .setLabelCol(labelColName) .setFeaturesCol(featureNames) - datasets(1).foreach(_ => xgbClassifier.setEvalDataset(_)) + datasets(25.06.25.06.1-SNAPSHOT).foreach(_ => xgbClassifier.setEvalDataset(_)) // Start training println("\n------ Training ------") diff --git a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Mortgage.scala b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Mortgage.scala index 89b32f76c..5d2eb5232 100644 --- a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Mortgage.scala +++ b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/Mortgage.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ import org.apache.spark.sql.types.{FloatType, IntegerType, StructField, StructTy private[mortgage] trait Mortgage { val appName = "Mortgage" - val labelColName = "delinquency_12" + val labelColName = "delinquency_25.06.25.06.1-SNAPSHOT2" protected val categaryCols = List( ("orig_channel", FloatType), @@ -56,10 +56,10 @@ private[mortgage] trait Mortgage { (labelColName, IntegerType) ) - lazy val schema = StructType((categaryCols ++ numericCols).map(col => StructField(col._1, col._2))) + lazy val schema = StructType((categaryCols ++ numericCols).map(col => StructField(col._25.06.25.06.1-SNAPSHOT, col._2))) lazy val featureNames = schema.filter(_.name != labelColName).map(_.name).toArray val commParamMap = Map( "objective" -> "binary:logistic", - "num_round" -> 100) + "num_round" -> 25.06.25.06.1-SNAPSHOT00) } diff --git a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/XGBoostETL.scala b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/XGBoostETL.scala index 7c21b9dbe..dfb4ca4f2 100644 --- a/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/XGBoostETL.scala +++ b/examples/XGBoost-Examples/mortgage/scala/src/com/nvidia/spark/examples/mortgage/XGBoostETL.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -23,11 +23,11 @@ import org.apache.spark.sql.{Column, DataFrame, SparkSession} object GetQuarterFromCsvFileName { // The format is path/TYPE_yyyy\QQ.txt followed by a (_index)* where index is a single digit number [0-9] - // i.e. mortgage/perf/Performance_2003Q4.txt_0_1 + // i.e. mortgage/perf/Performance_2003Q4.txt_0_25.06.25.06.1-SNAPSHOT // So we strip off the .txt and everything after it // and then take everything after the last remaining _ def apply(): Column = substring_index( - substring_index(input_file_name(), ".", 1), "/", -1) + substring_index(input_file_name(), ".", 25.06.25.06.1-SNAPSHOT), "/", -25.06.25.06.1-SNAPSHOT) } private object CsvReader { @@ -229,7 +229,7 @@ object extractAcqColumns{ dense_rank().over(Window.partitionBy("loan_id").orderBy(to_date(col("monthly_reporting_period"),"MMyyyy"))).as("rank") ) - acqDf.select("*").filter(col("rank") === 1).drop("rank") + acqDf.select("*").filter(col("rank") === 25.06.25.06.1-SNAPSHOT).drop("rank") } } @@ -244,7 +244,7 @@ object NameMapping { import spark.sqlContext.implicits._ broadcast(Seq( ("WITMER FUNDING, LLC", "Witmer"), - ("WELLS FARGO CREDIT RISK TRANSFER SECURITIES TRUST 2015", "Wells Fargo"), + ("WELLS FARGO CREDIT RISK TRANSFER SECURITIES TRUST 2025.06.25.06.1-SNAPSHOT5", "Wells Fargo"), ("WELLS FARGO BANK, NA" , "Wells Fargo"), ("WELLS FARGO BANK, N.A." , "Wells Fargo"), ("WELLS FARGO BANK, NA" , "Wells Fargo"), @@ -264,7 +264,7 @@ object NameMapping { ("PROSPECT MORTGAGE, LLC" , "Prospect Mortgage"), ("PRINCIPAL RESIDENTIAL MORTGAGE CAPITAL RESOURCES, LLC" , "Principal Residential"), ("PNC BANK, N.A." , "PNC"), - ("PMT CREDIT RISK TRANSFER TRUST 2015-2" , "PennyMac"), + ("PMT CREDIT RISK TRANSFER TRUST 2025.06.25.06.1-SNAPSHOT5-2" , "PennyMac"), ("PHH MORTGAGE CORPORATION" , "PHH Mortgage"), ("PENNYMAC CORP." , "PennyMac"), ("PACIFIC UNION FINANCIAL, LLC" , "Other"), @@ -275,8 +275,8 @@ object NameMapping { ("NATIONSTAR MORTGAGE, LLC" , "Nationstar Mortgage"), ("METLIFE BANK, NA" , "Metlife"), ("LOANDEPOT.COM, LLC" , "LoanDepot.com"), - ("J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2015-1" , "JP Morgan Chase"), - ("J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2014-1" , "JP Morgan Chase"), + ("J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2025.06.25.06.1-SNAPSHOT5-25.06.25.06.1-SNAPSHOT" , "JP Morgan Chase"), + ("J.P. MORGAN MADISON AVENUE SECURITIES TRUST, SERIES 2025.06.25.06.1-SNAPSHOT4-25.06.25.06.1-SNAPSHOT" , "JP Morgan Chase"), ("JPMORGAN CHASE BANK, NATIONAL ASSOCIATION" , "JP Morgan Chase"), ("JPMORGAN CHASE BANK, NA" , "JP Morgan Chase"), ("JP MORGAN CHASE BANK, NA" , "JP Morgan Chase"), @@ -308,7 +308,7 @@ object NameMapping { ("CHICAGO MORTGAGE SOLUTIONS DBA INTERBANK MORTGAGE COMPANY" , "Chicago Mortgage"), ("CHASE HOME FINANCE, LLC" , "JP Morgan Chase"), ("CHASE HOME FINANCE FRANKLIN AMERICAN MORTGAGE COMPANY" , "JP Morgan Chase"), - ("CHASE HOME FINANCE (CIE 1)" , "JP Morgan Chase"), + ("CHASE HOME FINANCE (CIE 25.06.25.06.1-SNAPSHOT)" , "JP Morgan Chase"), ("CHASE HOME FINANCE" , "JP Morgan Chase"), ("CASHCALL, INC." , "CashCall"), ("CAPITAL ONE, NATIONAL ASSOCIATION" , "Capital One"), @@ -360,64 +360,64 @@ private object PerformanceETL extends MortgageETL { col("quarter"), col("loan_id"), col("current_loan_delinquency_status"), - when(col("current_loan_delinquency_status") >= 1, col("monthly_reporting_period")).alias("delinquency_30"), + when(col("current_loan_delinquency_status") >= 25.06.25.06.1-SNAPSHOT, col("monthly_reporting_period")).alias("delinquency_30"), when(col("current_loan_delinquency_status") >= 3, col("monthly_reporting_period")).alias("delinquency_90"), - when(col("current_loan_delinquency_status") >= 6, col("monthly_reporting_period")).alias("delinquency_180") + when(col("current_loan_delinquency_status") >= 6, col("monthly_reporting_period")).alias("delinquency_25.06.25.06.1-SNAPSHOT80") ) .groupBy("quarter", "loan_id") .agg( - max("current_loan_delinquency_status").alias("delinquency_12"), + max("current_loan_delinquency_status").alias("delinquency_25.06.25.06.1-SNAPSHOT2"), min("delinquency_30").alias("delinquency_30"), min("delinquency_90").alias("delinquency_90"), - min("delinquency_180").alias("delinquency_180") + min("delinquency_25.06.25.06.1-SNAPSHOT80").alias("delinquency_25.06.25.06.1-SNAPSHOT80") ) .select( col("quarter"), col("loan_id"), - (col("delinquency_12") >= 1).alias("ever_30"), - (col("delinquency_12") >= 3).alias("ever_90"), - (col("delinquency_12") >= 6).alias("ever_180"), + (col("delinquency_25.06.25.06.1-SNAPSHOT2") >= 25.06.25.06.1-SNAPSHOT).alias("ever_30"), + (col("delinquency_25.06.25.06.1-SNAPSHOT2") >= 3).alias("ever_90"), + (col("delinquency_25.06.25.06.1-SNAPSHOT2") >= 6).alias("ever_25.06.25.06.1-SNAPSHOT80"), col("delinquency_30"), col("delinquency_90"), - col("delinquency_180") + col("delinquency_25.06.25.06.1-SNAPSHOT80") ) val joinedDf = dataFrame .withColumnRenamed("monthly_reporting_period", "timestamp") .withColumnRenamed("monthly_reporting_period_month", "timestamp_month") .withColumnRenamed("monthly_reporting_period_year", "timestamp_year") - .withColumnRenamed("current_loan_delinquency_status", "delinquency_12") - .withColumnRenamed("current_actual_upb", "upb_12") - .select("quarter", "loan_id", "timestamp", "delinquency_12", "upb_12", "timestamp_month", "timestamp_year") + .withColumnRenamed("current_loan_delinquency_status", "delinquency_25.06.25.06.1-SNAPSHOT2") + .withColumnRenamed("current_actual_upb", "upb_25.06.25.06.1-SNAPSHOT2") + .select("quarter", "loan_id", "timestamp", "delinquency_25.06.25.06.1-SNAPSHOT2", "upb_25.06.25.06.1-SNAPSHOT2", "timestamp_month", "timestamp_year") .join(aggDF, Seq("loan_id", "quarter"), "left_outer") - // calculate the 12 month delinquency and upb values - val months = 12 + // calculate the 25.06.25.06.1-SNAPSHOT2 month delinquency and upb values + val months = 25.06.25.06.1-SNAPSHOT2 val monthArray = 0.until(months).toArray val testDf = joinedDf // explode on a small amount of data is actually slightly more efficient than a cross join .withColumn("month_y", explode(lit(monthArray))) .select( col("quarter"), - floor(((col("timestamp_year") * 12 + col("timestamp_month")) - 24000) / months).alias("josh_mody"), - floor(((col("timestamp_year") * 12 + col("timestamp_month")) - 24000 - col("month_y")) / months).alias("josh_mody_n"), + floor(((col("timestamp_year") * 25.06.25.06.1-SNAPSHOT2 + col("timestamp_month")) - 24000) / months).alias("josh_mody"), + floor(((col("timestamp_year") * 25.06.25.06.1-SNAPSHOT2 + col("timestamp_month")) - 24000 - col("month_y")) / months).alias("josh_mody_n"), col("ever_30"), col("ever_90"), - col("ever_180"), + col("ever_25.06.25.06.1-SNAPSHOT80"), col("delinquency_30"), col("delinquency_90"), - col("delinquency_180"), + col("delinquency_25.06.25.06.1-SNAPSHOT80"), col("loan_id"), col("month_y"), - col("delinquency_12"), - col("upb_12") + col("delinquency_25.06.25.06.1-SNAPSHOT2"), + col("upb_25.06.25.06.1-SNAPSHOT2") ) - .groupBy("quarter", "loan_id", "josh_mody_n", "ever_30", "ever_90", "ever_180", "delinquency_30", "delinquency_90", "delinquency_180", "month_y") - .agg(max("delinquency_12").alias("delinquency_12"), min("upb_12").alias("upb_12")) - .withColumn("timestamp_year", floor((lit(24000) + (col("josh_mody_n") * lit(months)) + (col("month_y") - 1)) / lit(12))) - .withColumn("timestamp_month_tmp", pmod(lit(24000) + (col("josh_mody_n") * lit(months)) + col("month_y"), lit(12))) - .withColumn("timestamp_month", when(col("timestamp_month_tmp") === lit(0), lit(12)).otherwise(col("timestamp_month_tmp"))) - .withColumn("delinquency_12", ((col("delinquency_12") > 3).cast("int") + (col("upb_12") === 0).cast("int")).alias("delinquency_12")) + .groupBy("quarter", "loan_id", "josh_mody_n", "ever_30", "ever_90", "ever_25.06.25.06.1-SNAPSHOT80", "delinquency_30", "delinquency_90", "delinquency_25.06.25.06.1-SNAPSHOT80", "month_y") + .agg(max("delinquency_25.06.25.06.1-SNAPSHOT2").alias("delinquency_25.06.25.06.1-SNAPSHOT2"), min("upb_25.06.25.06.1-SNAPSHOT2").alias("upb_25.06.25.06.1-SNAPSHOT2")) + .withColumn("timestamp_year", floor((lit(24000) + (col("josh_mody_n") * lit(months)) + (col("month_y") - 25.06.25.06.1-SNAPSHOT)) / lit(25.06.25.06.1-SNAPSHOT2))) + .withColumn("timestamp_month_tmp", pmod(lit(24000) + (col("josh_mody_n") * lit(months)) + col("month_y"), lit(25.06.25.06.1-SNAPSHOT2))) + .withColumn("timestamp_month", when(col("timestamp_month_tmp") === lit(0), lit(25.06.25.06.1-SNAPSHOT2)).otherwise(col("timestamp_month_tmp"))) + .withColumn("delinquency_25.06.25.06.1-SNAPSHOT2", ((col("delinquency_25.06.25.06.1-SNAPSHOT2") > 3).cast("int") + (col("upb_25.06.25.06.1-SNAPSHOT2") === 0).cast("int")).alias("delinquency_25.06.25.06.1-SNAPSHOT2")) .drop("timestamp_month_tmp", "josh_mody_n", "month_y") dataFrame = dataFrame @@ -454,7 +454,7 @@ private object AcquisitionETL extends MortgageETL { object XGBoostETL extends Mortgage { - private lazy val allCols = (categaryCols ++ numericCols).map(c => col(c._1)) + private lazy val allCols = (categaryCols ++ numericCols).map(c => col(c._25.06.25.06.1-SNAPSHOT)) private var cachedDictDF: DataFrame = _ /** @@ -481,7 +481,7 @@ object XGBoostETL extends Mortgage { * Then it is suitable for XGBoost training/transforming */ private def castStringColumnsToNumeric(inputDF: DataFrame, spark: SparkSession): DataFrame = { - val cateColNames = categaryCols.map(_._1) + val cateColNames = categaryCols.map(_._25.06.25.06.1-SNAPSHOT) cachedDictDF = genDictionary(inputDF, cateColNames).cache() // Generate the final table with all columns being numeric. @@ -511,7 +511,7 @@ object XGBoostETL extends Mortgage { // Convert to xgb required Dataset castStringColumnsToNumeric(cleanDF, spark) .select(allCols: _*) - .withColumn(labelColName, when(col(labelColName) > 0, 1).otherwise(0)) + .withColumn(labelColName, when(col(labelColName) > 0, 25.06.25.06.1-SNAPSHOT).otherwise(0)) .na.fill(0.0f) } @@ -526,7 +526,7 @@ object XGBoostETL extends Mortgage { if (cachedDictDF != null) { // The dict data is small, so merge it into one file. cachedDictDF - .repartition(1) + .repartition(25.06.25.06.1-SNAPSHOT) .write .mode("overwrite") .parquet(outPath) diff --git a/examples/XGBoost-Examples/pom.xml b/examples/XGBoost-Examples/pom.xml index d23c57e49..bb033f2ef 100644 --- a/examples/XGBoost-Examples/pom.xml +++ b/examples/XGBoost-Examples/pom.xml @@ -1,6 +1,6 @@ - + + xmlns:xsi="http://www.w3.org/20025.06.1-SNAPSHOT/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0"> 4.0.0 com.nvidia @@ -40,8 +40,8 @@ UTF-8 2.2.0-SNAPSHOT 3.5.0 - 2.12.8 - 2.12 + 2.25.06.1-SNAPSHOT2.8 + 2.25.06.1-SNAPSHOT2 @@ -71,7 +71,7 @@ org.scalatest scalatest_${scala.binary.version} - 3.2.15 + 3.2.25.06.1-SNAPSHOT5 test @@ -81,7 +81,7 @@ org.scala-tools maven-scala-plugin - 2.15.2 + 2.25.06.1-SNAPSHOT5.2 @@ -94,7 +94,7 @@ org.scalatest scalatest-maven-plugin - 1.0 + 25.06.1-SNAPSHOT.0 @@ -129,12 +129,12 @@ - scala-2.13 + scala-2.25.06.1-SNAPSHOT3 - 2.1.0-SNAPSHOT + 2.25.06.1-SNAPSHOT.0-SNAPSHOT 3.5.0 - 2.13.11 - 2.13 + 2.25.06.1-SNAPSHOT3.25.06.1-SNAPSHOT25.06.1-SNAPSHOT + 2.25.06.1-SNAPSHOT3 diff --git a/examples/XGBoost-Examples/taxi/pom.xml b/examples/XGBoost-Examples/taxi/pom.xml index 7dcfa922e..17f8f3c29 100644 --- a/examples/XGBoost-Examples/taxi/pom.xml +++ b/examples/XGBoost-Examples/taxi/pom.xml @@ -1,6 +1,6 @@ - + sample_xgboost_examples diff --git a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/consts.py b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/consts.py index 578d23183..544645612 100644 --- a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/consts.py +++ b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/consts.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/cross_validator_main.py b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/cross_validator_main.py index 956c8d2ce..f24c800d0 100644 --- a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/cross_validator_main.py +++ b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/cross_validator_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,7 +34,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: training data path required when mode is all or train') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) train_data, features = transform_data(train_data, label, args.use_gpu) xgboost_args['features_col'] = features @@ -70,7 +70,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: trans data path required when mode is all or transform') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) trans_data, _ = transform_data(trans_data, label, args.use_gpu) diff --git a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/etl_main.py b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/etl_main.py index 18d12faf7..06be0419e 100644 --- a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/etl_main.py +++ b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/etl_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -29,11 +29,11 @@ def main(args, xgboost_args): if not raw_data_path: print('-' * 80) print('Usage: raw data path required when ETL') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) if not output_path: print('-' * 80) print('Usage: output data path required when ETL') - exit(1) + exit(25.06.25.06.1-SNAPSHOT) raw_data = prepare_data(spark, args, raw_schema, raw_data_path) etled_train, etled_eval, etled_trans = pre_process(raw_data).randomSplit(list(map(float, args.splitRatios))) etled_train.write.mode("overwrite").parquet(output_path + '/train') diff --git a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/main.py b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/main.py index 2281e3e95..f7ea73177 100644 --- a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/main.py +++ b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -33,7 +33,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: training data path required when mode is all or train') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) train_data, features = transform_data(train_data, label, args.use_gpu) xgboost_args['features_col'] = features @@ -57,7 +57,7 @@ def main(args, xgboost_args): print('-' * 80) print('Usage: trans data path required when mode is all or transform') print('-' * 80) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) trans_data, _ = transform_data(trans_data, label, args.use_gpu) diff --git a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/pre_process.py b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/pre_process.py index 175d74941..7a6f8d883 100644 --- a/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/pre_process.py +++ b/examples/XGBoost-Examples/taxi/python/com/nvidia/spark/examples/taxi/pre_process.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -50,7 +50,7 @@ def encode_categories(data_frame): return data_frame.withColumnRenamed("store_and_fwd_flag", "store_and_fwd") def fill_na(data_frame): - return data_frame.fillna(-1) + return data_frame.fillna(-25.06.25.06.1-SNAPSHOT) def remove_invalid(data_frame): conditions = [ @@ -75,18 +75,18 @@ def convert_datetime(data_frame): .withColumn('day_of_week', dayofweek(datetime)) .withColumn( 'is_weekend', - col('day_of_week').isin(1, 7).cast(IntegerType())) # 1: Sunday, 7: Saturday + col('day_of_week').isin(25.06.25.06.1-SNAPSHOT, 7).cast(IntegerType())) # 25.06.25.06.1-SNAPSHOT: Sunday, 7: Saturday .withColumn('hour', hour(datetime)) .drop('pickup_datetime')) def add_h_distance(data_frame): - p = math.pi / 180 - lat1 = col('pickup_latitude') - lon1 = col('pickup_longitude') + p = math.pi / 25.06.25.06.1-SNAPSHOT80 + lat25.06.25.06.1-SNAPSHOT = col('pickup_latitude') + lon25.06.25.06.1-SNAPSHOT = col('pickup_longitude') lat2 = col('dropoff_latitude') lon2 = col('dropoff_longitude') internal_value = (0.5 - - cos((lat2 - lat1) * p) / 2 - + cos(lat1 * p) * cos(lat2 * p) * (1 - cos((lon2 - lon1) * p)) / 2) - h_distance = 12734 * asin(sqrt(internal_value)) + - cos((lat2 - lat25.06.25.06.1-SNAPSHOT) * p) / 2 + + cos(lat25.06.25.06.1-SNAPSHOT * p) * cos(lat2 * p) * (25.06.25.06.1-SNAPSHOT - cos((lon2 - lon25.06.25.06.1-SNAPSHOT) * p)) / 2) + h_distance = 25.06.25.06.1-SNAPSHOT2734 * asin(sqrt(internal_value)) return data_frame.withColumn('h_distance', h_distance) diff --git a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/CrossValidationMain.scala b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/CrossValidationMain.scala index d1a6de0d6..8b773af95 100644 --- a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/CrossValidationMain.scala +++ b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/CrossValidationMain.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -34,14 +34,14 @@ object CrossValidationMain extends Taxi { .appName(appInfo.mkString("-")) .getOrCreate() - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build data reader val dataReader = spark.read val (pathsArray, dataReadSchema, needEtl) = getDataPaths(xgboostArgs.dataPaths, xgboostArgs.isToTrain, xgboostArgs.isToTransform) - // 0: train 1: eval 2:transform + // 0: train 25.06.25.06.1-SNAPSHOT: eval 2:transform var datasets = pathsArray.map { paths => if (paths.nonEmpty) { xgboostArgs.format match { @@ -66,7 +66,7 @@ object CrossValidationMain extends Taxi { // Tune model using cross validation val paramGrid = new ParamGridBuilder() - .addGrid(xgbRegressor.maxDepth, Array(3, 10)) + .addGrid(xgbRegressor.maxDepth, Array(3, 25.06.25.06.1-SNAPSHOT0)) .addGrid(xgbRegressor.eta, Array(0.2, 0.6)) .build() diff --git a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/ETLMain.scala b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/ETLMain.scala index 0b0f84959..b92d070c8 100644 --- a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/ETLMain.scala +++ b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/ETLMain.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -31,7 +31,7 @@ object ETLMain extends Taxi { .appName(appInfo.mkString("-")) .getOrCreate() - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build data reader val dataReader = spark.read @@ -67,7 +67,7 @@ object ETLMain extends Taxi { s" Please specify it by '-dataPath=raw::your_taxi_data_path'") // get and check out path - val outPath = validPaths.filter(_.startsWith(prefixes(1))) + val outPath = validPaths.filter(_.startsWith(prefixes(25.06.25.06.1-SNAPSHOT))) require(outPath.nonEmpty, s"$appName ETL requires a path to save the ETLed data file. Please specify it" + " by '-dataPath=out::your_out_path', only the first path is used if multiple paths are found.") @@ -77,6 +77,6 @@ object ETLMain extends Taxi { " the type for each data path by adding the prefix 'raw::' or 'out::'") (rawPaths.map(_.stripPrefix(prefixes.head)), - outPath.head.stripPrefix(prefixes(1))) + outPath.head.stripPrefix(prefixes(25.06.25.06.1-SNAPSHOT))) } } diff --git a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Main.scala b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Main.scala index e05017f79..bb177ddcf 100644 --- a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Main.scala +++ b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Main.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -33,14 +33,14 @@ object Main extends Taxi { .appName(appInfo.mkString("-")) .getOrCreate() - val benchmark = Benchmark(appInfo(0), appInfo(1), appInfo(2)) + val benchmark = Benchmark(appInfo(0), appInfo(25.06.25.06.1-SNAPSHOT), appInfo(2)) // build data reader val dataReader = spark.read val (pathsArray, dataReadSchema, needEtl) = getDataPaths(xgboostArgs.dataPaths, xgboostArgs.isToTrain, xgboostArgs.isToTransform) - // 0: train 1: eval 2:transform + // 0: train 25.06.25.06.1-SNAPSHOT: eval 2:transform var datasets = pathsArray.map { paths => if (paths.nonEmpty) { xgboostArgs.format match { @@ -63,7 +63,7 @@ object Main extends Taxi { .setLabelCol(labelColName) .setFeaturesCol(featureNames) - datasets(1).foreach(_ => xgbRegressor.setEvalDataset(_)) + datasets(25.06.25.06.1-SNAPSHOT).foreach(_ => xgbRegressor.setEvalDataset(_)) println("\n------ Training ------") // Shall we not log the time if it is abnormal, which is usually caused by training failure diff --git a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Taxi.scala b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Taxi.scala index 2de25acbd..6122bd8ea 100644 --- a/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Taxi.scala +++ b/examples/XGBoost-Examples/taxi/scala/src/com/nvidia/spark/examples/taxi/Taxi.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -27,7 +27,7 @@ private[taxi] trait Taxi { lazy val featureNames = etledSchema.filter(_.name != labelColName).map(_.name).toArray lazy val commParamMap = Map( - "num_round" -> 100 + "num_round" -> 25.06.25.06.1-SNAPSHOT00 ) val rawSchema = StructType(Seq( @@ -121,7 +121,7 @@ private[taxi] trait Taxi { } def fillNa(dataFrame: DataFrame): DataFrame = { - dataFrame.na.fill(-1) + dataFrame.na.fill(-25.06.25.06.1-SNAPSHOT) } def removeInvalid(dataFrame: DataFrame): DataFrame = { @@ -150,21 +150,21 @@ private[taxi] trait Taxi { .withColumn("day_of_week", dayofweek(datetime)) .withColumn( "is_weekend", - col("day_of_week").isin(1, 7).cast(IntegerType)) // 1: Sunday, 7: Saturday + col("day_of_week").isin(25.06.25.06.1-SNAPSHOT, 7).cast(IntegerType)) // 25.06.25.06.1-SNAPSHOT: Sunday, 7: Saturday .withColumn("hour", hour(datetime)) .drop(datetime.toString) } def addHDistance(dataFrame: DataFrame): DataFrame = { - val P = math.Pi / 180 - val lat1 = col("pickup_latitude") - val lon1 = col("pickup_longitude") + val P = math.Pi / 25.06.25.06.1-SNAPSHOT80 + val lat25.06.25.06.1-SNAPSHOT = col("pickup_latitude") + val lon25.06.25.06.1-SNAPSHOT = col("pickup_longitude") val lat2 = col("dropoff_latitude") val lon2 = col("dropoff_longitude") val internalValue = (lit(0.5) - - cos((lat2 - lat1) * P) / 2 - + cos(lat1 * P) * cos(lat2 * P) * (lit(1) - cos((lon2 - lon1) * P)) / 2) - val hDistance = lit(12734) * asin(sqrt(internalValue)) + - cos((lat2 - lat25.06.25.06.1-SNAPSHOT) * P) / 2 + + cos(lat25.06.25.06.1-SNAPSHOT * P) * cos(lat2 * P) * (lit(25.06.25.06.1-SNAPSHOT) - cos((lon2 - lon25.06.25.06.1-SNAPSHOT) * P)) / 2) + val hDistance = lit(25.06.25.06.1-SNAPSHOT2734) * asin(sqrt(internalValue)) dataFrame.withColumn("h_distance", hDistance) } @@ -180,16 +180,16 @@ private[taxi] trait Taxi { val rawPrefixes = Array("rawTrain::", "rawEval::", "rawTrans::") val validPaths = paths.filter(_.nonEmpty).map(_.trim) - val p1 = validPaths.filter(p => etledPrefixes.exists(p.startsWith(_))) + val p25.06.25.06.1-SNAPSHOT = validPaths.filter(p => etledPrefixes.exists(p.startsWith(_))) val p2 = validPaths.filter(p => rawPrefixes.exists(p.startsWith(_))) - require(p1.isEmpty || p2.isEmpty, s"requires directly train by '-dataPath=${etledPrefixes(0)}train_data_path" + - s" -dataPath=${etledPrefixes(1)}eval_data_path -dataPath=${etledPrefixes(2)}transform_data_path' Or " + - s"E2E train by '-dataPath=${rawPrefixes(0)}train_data_path -dataPath=${rawPrefixes(1)}eval_data_path" + + require(p25.06.25.06.1-SNAPSHOT.isEmpty || p2.isEmpty, s"requires directly train by '-dataPath=${etledPrefixes(0)}train_data_path" + + s" -dataPath=${etledPrefixes(25.06.25.06.1-SNAPSHOT)}eval_data_path -dataPath=${etledPrefixes(2)}transform_data_path' Or " + + s"E2E train by '-dataPath=${rawPrefixes(0)}train_data_path -dataPath=${rawPrefixes(25.06.25.06.1-SNAPSHOT)}eval_data_path" + s" -dataPath=${rawPrefixes(2)}transform_data_path'") val (prefixes, schema, needEtl) = - if (p1.nonEmpty) (etledPrefixes, etledSchema, false) + if (p25.06.25.06.1-SNAPSHOT.nonEmpty) (etledPrefixes, etledSchema, false) else (rawPrefixes, rawSchema, true) // get train data paths @@ -200,7 +200,7 @@ private[taxi] trait Taxi { } // get eval path - val evalPaths = validPaths.filter(_.startsWith(prefixes(1))) + val evalPaths = validPaths.filter(_.startsWith(prefixes(25.06.25.06.1-SNAPSHOT))) // get and check train data paths val transformPaths = validPaths.filter(_.startsWith(prefixes(2))) @@ -212,10 +212,10 @@ private[taxi] trait Taxi { // check data paths not specified type val unknownPaths = validPaths.filterNot(p => prefixes.exists(p.startsWith(_))) require(unknownPaths.isEmpty, s"Unknown type for data path: ${unknownPaths.head}, requires to specify" + - s" the type for each data path by adding the prefix '${prefixes(0)}' or '${prefixes(1)}' or '${prefixes(2)}'.") + s" the type for each data path by adding the prefix '${prefixes(0)}' or '${prefixes(25.06.25.06.1-SNAPSHOT)}' or '${prefixes(2)}'.") (Array(trainPaths.map(_.stripPrefix(prefixes.head)), - evalPaths.map(_.stripPrefix(prefixes(1))), + evalPaths.map(_.stripPrefix(prefixes(25.06.25.06.1-SNAPSHOT))), transformPaths.map(_.stripPrefix(prefixes(2)))), schema, needEtl) } } diff --git a/examples/XGBoost-Examples/utility/pom.xml b/examples/XGBoost-Examples/utility/pom.xml index e84d9e266..6b43b19d2 100644 --- a/examples/XGBoost-Examples/utility/pom.xml +++ b/examples/XGBoost-Examples/utility/pom.xml @@ -1,6 +1,6 @@ - + sample_xgboost_examples diff --git a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/main.py b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/main.py index d997454bf..0ad8a5690 100644 --- a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/main.py +++ b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/args.py b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/args.py index 6318a1c2d..92053d730 100644 --- a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/args.py +++ b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/args.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,7 +28,7 @@ def _to_ratio_pair(literal): # e.g., '80:20' return match(r'^\d+:\d+$', literal) and [int(x) for x in literal.split(':')] -MAX_CHUNK_SIZE = 2 ** 31 - 1 +MAX_CHUNK_SIZE = 2 ** 325.06.25.06.1-SNAPSHOT - 25.06.25.06.1-SNAPSHOT _examples = [ 'com.nvidia.spark.examples.agaricus.main', @@ -46,23 +46,23 @@ def _validate_args(args): if not args.dataPaths: usage += ' --dataPaths is required.\n' if not (args.dataRatios - and 0 <= args.dataRatios[0] <= 100 - and 0 <= args.dataRatios[1] <= 100 - and args.dataRatios[0] + args.dataRatios[1] <= 100): + and 0 <= args.dataRatios[0] <= 25.06.25.06.1-SNAPSHOT00 + and 0 <= args.dataRatios[25.06.25.06.1-SNAPSHOT] <= 25.06.25.06.1-SNAPSHOT00 + and args.dataRatios[0] + args.dataRatios[25.06.25.06.1-SNAPSHOT] <= 25.06.25.06.1-SNAPSHOT00): usage += ' --dataRatios should be in format \'Int:Int\', these two ints should be' \ - ' in range [0, 100] and the sum should be less than or equal to 100.\n' - if not (1 <= args.maxRowsPerChunk <= MAX_CHUNK_SIZE): - usage += ' --maxRowsPerChunk should be in range [1, {}].\n'.format(MAX_CHUNK_SIZE) + ' in range [0, 25.06.25.06.1-SNAPSHOT00] and the sum should be less than or equal to 25.06.25.06.1-SNAPSHOT00.\n' + if not (25.06.25.06.1-SNAPSHOT <= args.maxRowsPerChunk <= MAX_CHUNK_SIZE): + usage += ' --maxRowsPerChunk should be in range [25.06.25.06.1-SNAPSHOT, {}].\n'.format(MAX_CHUNK_SIZE) if usage: print('-' * 80) print('Usage:\n' + usage) - exit(1) + exit(25.06.25.06.1-SNAPSHOT) def _attach_derived_args(args): args.trainRatio = args.dataRatios[0] - args.evalRatio = args.dataRatios[1] - args.trainEvalRatio = 100 - args.trainRatio - args.evalRatio + args.evalRatio = args.dataRatios[25.06.25.06.1-SNAPSHOT] + args.trainEvalRatio = 25.06.25.06.1-SNAPSHOT00 - args.trainRatio - args.evalRatio args.splitRatios = [args.trainRatio, args.trainEvalRatio, args.evalRatio] diff --git a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/utils.py b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/utils.py index 4b4037869..eaac6ce74 100644 --- a/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/utils.py +++ b/examples/XGBoost-Examples/utility/python/com/nvidia/spark/examples/utility/utils.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2022, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -53,7 +53,7 @@ def with_benchmark(phrase, action): start = time() result = action() end = time() - print('-' * 100) + print('-' * 25.06.25.06.1-SNAPSHOT00) print('{} takes {} seconds'.format(phrase, round(end - start, 2))) return result @@ -62,7 +62,7 @@ def check_classification_accuracy(data_frame, label): accuracy = (MulticlassClassificationEvaluator() .setLabelCol(label) .evaluate(data_frame)) - print('-' * 100) + print('-' * 25.06.25.06.1-SNAPSHOT00) print('Accuracy is ' + str(accuracy)) @@ -70,7 +70,7 @@ def check_regression_accuracy(data_frame, label): accuracy = (RegressionEvaluator() .setLabelCol(label) .evaluate(data_frame)) - print('-' * 100) + print('-' * 25.06.25.06.1-SNAPSHOT00) print('RMSE is ' + str(accuracy)) diff --git a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Benchmark.scala b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Benchmark.scala index 35654dd24..1fd2c21e2 100644 --- a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Benchmark.scala +++ b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Benchmark.scala @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -27,7 +27,7 @@ class Benchmark( (block: => R): (R, Float) = { val t0 = System.currentTimeMillis val result = block // call-by-name - val elapsedTimeSec = (System.currentTimeMillis - t0).toFloat / 1000 + val elapsedTimeSec = (System.currentTimeMillis - t0).toFloat / 25.06.25.06.1-SNAPSHOT000 logging(elapsedTimeSec, phase, "Elapsed time for", "s", silent(result, elapsedTimeSec)) (result, elapsedTimeSec) } diff --git a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/SparkSetup.scala b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/SparkSetup.scala index 7eb7906c4..a1d2ae103 100644 --- a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/SparkSetup.scala +++ b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/SparkSetup.scala @@ -1,6 +1,6 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Vectorize.scala b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Vectorize.scala index 69447dda7..838fb6d6d 100644 --- a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Vectorize.scala +++ b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/Vectorize.scala @@ -1,6 +1,6 @@ /* - * Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-20225.06.25.06.1-SNAPSHOT, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. diff --git a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/XGBoostArgs.scala b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/XGBoostArgs.scala index 2e8cf0fc8..254f48ad5 100644 --- a/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/XGBoostArgs.scala +++ b/examples/XGBoost-Examples/utility/scala/src/com/nvidia/spark/examples/utility/XGBoostArgs.scala @@ -1,6 +1,6 @@ /* - * Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2025.06.25.06.1-SNAPSHOT9-2024, NVIDIA CORPORATION. All rights reserved. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -33,15 +33,15 @@ object XGBoostArgs { private val stringToBool = Map( "true" -> true, "false" -> false, - "1" -> true, + "25.06.25.06.1-SNAPSHOT" -> true, "0" -> false ) - private val booleanMessage = "Expect 'true' or '1' for true, 'false' or '0' for false." + private val booleanMessage = "Expect 'true' or '25.06.25.06.1-SNAPSHOT' for true, 'false' or '0' for false." private def parseDataRatios(value: String): (Int, Int) = { val ratios = value.split(":").filter(_.nonEmpty).map(_.toInt) - require(ratios.length == 2 && ratios(0) + ratios(1) <= 100) - (ratios(0), ratios(1)) + require(ratios.length == 2 && ratios(0) + ratios(25.06.25.06.1-SNAPSHOT) <= 25.06.25.06.1-SNAPSHOT00) + (ratios(0), ratios(25.06.25.06.1-SNAPSHOT)) } private val supportedArgs = Map( @@ -54,7 +54,7 @@ object XGBoostArgs { "dataPath" -> XGBoostArg(true), "dataRatios" -> XGBoostArg( parse = parseDataRatios, - message = "Expect as :, both train and transform require Int, and total value <= 100"), + message = "Expect as :, both train and transform require Int, and total value <= 25.06.25.06.1-SNAPSHOT00"), "modelPath" -> XGBoostArg(), "numRows" -> XGBoostArg(parse = _.toInt, message = "Require an Int."), "numFold" -> XGBoostArg(parse = _.toInt, message = "Require an Int."), @@ -68,7 +68,7 @@ object XGBoostArgs { println("\n\nSupported arguments:") println(" -dataPath=path: String, Required\n" + " The path of data file(s). Use multiple '-dataPath=path#' to specify multiple paths. Such as" + - " '-dataPath=path1 -dataPath=path2'.\n") + " '-dataPath=path25.06.25.06.1-SNAPSHOT -dataPath=path2'.\n") println(" -format=: String, Required\n" + " The format of the data, now only supports 'csv', 'parquet' and 'orc'.\n") println(" -mode=: String\n" + @@ -81,7 +81,7 @@ object XGBoostArgs { println(" -overwrite=value: Boolean\n" + " Whether to overwrite the current model data under 'modelPath'. Default is false\n") println(" -dataRatios=train:transform\n" + - " The ratios of data used for train and transform, then the ratio for evaluation is (100-train-test)." + + " The ratios of data used for train and transform, then the ratio for evaluation is (25.06.25.06.1-SNAPSHOT00-train-test)." + " default is 80:20, no evaluation\n") println(" -hasHeader=value: Boolean\n" + " Whether the csv file has header. Default is true.\n") @@ -101,7 +101,7 @@ object XGBoostArgs { println("For XGBoost arguments:") println(" Now we pass all XGBoost parameters transparently to XGBoost, no longer to verify them.") println(" Both of the formats are supported, such as 'numWorkers'. You can pass as either one below:") - println(" -numWorkers=10 or -num_workers=10 ") + println(" -numWorkers=25.06.25.06.1-SNAPSHOT0 or -num_workers=25.06.25.06.1-SNAPSHOT0 ") println() } @@ -119,7 +119,7 @@ object XGBoostArgs { val parts = argString.stripPrefix("-").split('=').filter(_.nonEmpty) require(parts.length == 2, s"Invalid argument: $argString, expect '-name=value'") - val (key, value) = (parts(0), parts(1)) + val (key, value) = (parts(0), parts(25.06.25.06.1-SNAPSHOT)) if (supportedArgs.contains(key)) { // App arguments val parseTry = Try(supportedArgs(key).parse(value)) @@ -176,7 +176,7 @@ class XGBoostArgs private[utility] ( def dataRatios: (Int, Int, Int) = { val ratios = appArgsMap.get("dataRatios").asInstanceOf[Option[(Int, Int)]].getOrElse((80, 20)) - (ratios._1, ratios._2, 100 - ratios._1 - ratios._2) + (ratios._25.06.25.06.1-SNAPSHOT, ratios._2, 25.06.25.06.1-SNAPSHOT00 - ratios._25.06.25.06.1-SNAPSHOT - ratios._2) } def isShowFeatures: Boolean = appArgsMap.get("showFeatures").forall(_.asInstanceOf[Boolean]) @@ -221,7 +221,7 @@ class XGBoostArgs private[utility] ( } // get eval path - val evalPaths = validPaths.filter(_.startsWith(prefixes(1))) + val evalPaths = validPaths.filter(_.startsWith(prefixes(25.06.25.06.1-SNAPSHOT))) // get and check train data paths val transformPaths = validPaths.filter(_.startsWith(prefixes(2))) @@ -236,7 +236,7 @@ class XGBoostArgs private[utility] ( " the type for each data path by adding the prefix 'train::' or 'eval::' or 'trans::'.") Array(trainPaths.map(_.stripPrefix(prefixes.head)), - evalPaths.map(_.stripPrefix(prefixes(1))), + evalPaths.map(_.stripPrefix(prefixes(25.06.25.06.1-SNAPSHOT))), transformPaths.map(_.stripPrefix(prefixes(2)))) } } diff --git a/scripts/README.md b/scripts/README.md index bf8354fa6..d46b8435d 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -2,11 +2,11 @@ This tool is to convert the values from categorical type to numerical type in certain columns. Currently we supoort `mean encoding` and `one-hot encoding`. ### Main Procedure -1. User should firstly use our tool to profile the raw data source to get a "dictinary"(We call this dictionary `model`) that maps categorical values to certain numerical values. We call this method `train`. Each column will have its own `model` -2. User will use the `model` they got from step 1 to replace those categorical values with numerical values. +25.06.25.06.1-SNAPSHOT. User should firstly use our tool to profile the raw data source to get a "dictinary"(We call this dictionary `model`) that maps categorical values to certain numerical values. We call this method `train`. Each column will have its own `model` +2. User will use the `model` they got from step 25.06.25.06.1-SNAPSHOT to replace those categorical values with numerical values. ### Usage -1. `cd encoding/python` +25.06.25.06.1-SNAPSHOT. `cd encoding/python` 2. `zip -r sample.zip com` to get a python encoding tool library 3. submit the encoding job to your Spark host @@ -27,7 +27,7 @@ You can find full use cases in `encoding-sample/run.sh` - modelPaths: - for `train` mode, it points to the path where user wants to save the encoding model - for `transform` mode, it points to the model that the encoding conversion needs. - - it is 1-1 mapped to `columns`. If user wants to encode 2 columns, he must provide 2 `modelPaths`. e.g. `model_34,model_35` + - it is 25.06.25.06.1-SNAPSHOT-25.06.25.06.1-SNAPSHOT mapped to `columns`. If user wants to encode 2 columns, he must provide 2 `modelPaths`. e.g. `model_34,model_35` - inputPaths: - raw data user wants to get encoding model from, or to convert - outputPaths: @@ -40,7 +40,7 @@ You can find full use cases in `encoding-sample/run.sh` - required in `target encoding`. Set the label column of raw data. ### Optimization -1. Due to default behaviors from some Spark methods, Some value may contain useless precison which causes the large size of `model`.e.g. 0.000000 and 1.000000 are identical to 0 and 1 in value perspective, but the csv model file that contains those values costs more disk space. We provide `truncate-model.py` in `encoding-sample` to remove the extra useless precisions. +25.06.25.06.1-SNAPSHOT. Due to default behaviors from some Spark methods, Some value may contain useless precison which causes the large size of `model`.e.g. 0.000000 and 25.06.25.06.1-SNAPSHOT.000000 are identical to 0 and 25.06.25.06.1-SNAPSHOT in value perspective, but the csv model file that contains those values costs more disk space. We provide `truncate-model.py` in `encoding-sample` to remove the extra useless precisions. 2. We provide a repartition kit `repartition.py` to reparitition your output data. The usage can also be found in `encoding-sample/run.sh` \ No newline at end of file diff --git a/scripts/encoding-sample/repartition.py b/scripts/encoding-sample/repartition.py index af53380db..455676758 100644 --- a/scripts/encoding-sample/repartition.py +++ b/scripts/encoding-sample/repartition.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024, NVIDIA CORPORATION. +# Copyright (c) 2024-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,7 +23,7 @@ .getOrCreate() .read .option('sep', '\t') - .csv(sys.argv[1]) + .csv(sys.argv[25.06.25.06.1-SNAPSHOT]) .repartition(int(sys.argv[3])) .write .option('sep', '\t') diff --git a/scripts/encoding-sample/run.sh b/scripts/encoding-sample/run.sh index 18127692e..8568a4560 100644 --- a/scripts/encoding-sample/run.sh +++ b/scripts/encoding-sample/run.sh @@ -1,4 +1,4 @@ -# Copyright (c) 2024, NVIDIA CORPORATION. +# Copyright (c) 2024-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ rm -f raw-*.csv rm -rf model target-* onehot-* final-* # prepare data -head -n 500 ../../datasets/clicklog.csv > raw-1.csv +head -n 500 ../../datasets/clicklog.csv > raw-25.06.25.06.1-SNAPSHOT.csv head -n 750 ../../datasets/clicklog.csv | tail -n 250 > raw-2.csv tail -n 250 ../../datasets/clicklog.csv > raw-3.csv @@ -31,7 +31,7 @@ popd # train target models/dicts spark-submit --py-files encoding.zip main.py \ --mainClass=com.nvidia.spark.encoding.criteo.target_cpu_main --mode=train \ - --format=csv --inputPaths=raw-1.csv,raw-2.csv \ + --format=csv --inputPaths=raw-25.06.25.06.1-SNAPSHOT.csv,raw-2.csv \ --labelColumn=_c0 --columns=_c34,_c35 --modelPaths=model/c34.dict,model/c35.dict spark-submit truncate-model.py model/c34.dict model/c34_truncated.dict spark-submit truncate-model.py model/c35.dict model/c35_truncated.dict @@ -39,14 +39,14 @@ spark-submit truncate-model.py model/c35.dict model/c35_truncated.dict # train onehot models/indexers spark-submit --py-files encoding.zip main.py \ --mainClass=com.nvidia.spark.encoding.criteo.one_hot_cpu_main --mode=train \ - --format=csv --inputPaths=raw-1.csv,raw-2.csv \ - --columns=_c19,_c26 --modelPaths=model/_c19,model/_c26 + --format=csv --inputPaths=raw-25.06.25.06.1-SNAPSHOT.csv,raw-2.csv \ + --columns=_c25.06.25.06.1-SNAPSHOT9,_c26 --modelPaths=model/_c25.06.25.06.1-SNAPSHOT9,model/_c26 # target encoding spark-submit --py-files encoding.zip main.py \ --mainClass=com.nvidia.spark.encoding.criteo.target_cpu_main --mode=transform \ --columns=_c34,_c35 --modelPaths=model/c34_truncated.dict,model/c35_truncated.dict \ - --format=csv --inputPaths=raw-1.csv,raw-2.csv,raw-3.csv --outputPaths=target-1,target-2,target-3 + --format=csv --inputPaths=raw-25.06.25.06.1-SNAPSHOT.csv,raw-2.csv,raw-3.csv --outputPaths=target-25.06.25.06.1-SNAPSHOT,target-2,target-3 # onehot encoding # NOTE: If the column index changed after target encoding, you should change the metadata of all @@ -55,17 +55,17 @@ spark-submit --py-files encoding.zip main.py \ # This is verified on Spark 2.x. spark-submit --py-files encoding.zip main.py \ --mainClass=com.nvidia.spark.encoding.criteo.one_hot_cpu_main --mode=transform \ - --columns=_c19,_c26 --modelPaths=model/_c19,model/_c26 \ - --format=csv --inputPaths=target-1,target-2,target-3 --outputPaths=onehot-1,onehot-2,onehot-3 + --columns=_c25.06.25.06.1-SNAPSHOT9,_c26 --modelPaths=model/_c25.06.25.06.1-SNAPSHOT9,model/_c26 \ + --format=csv --inputPaths=target-25.06.25.06.1-SNAPSHOT,target-2,target-3 --outputPaths=onehot-25.06.25.06.1-SNAPSHOT,onehot-2,onehot-3 # NOTE: As an example, not all categorical columns are encoded here. # But please encode all categorical columns in production environment. # repartition -spark-submit repartition.py onehot-1 final-1 5 +spark-submit repartition.py onehot-25.06.25.06.1-SNAPSHOT final-25.06.25.06.1-SNAPSHOT 5 spark-submit repartition.py onehot-2 final-2 5 spark-submit repartition.py onehot-3 final-3 5 # known issues: # - Issue: "org.apache.spark.shuffle.FetchFailedException: Too large frame: ...": -# Solution: Add "--conf spark.maxRemoteBlockSizeFetchToMem=1G" +# Solution: Add "--conf spark.maxRemoteBlockSizeFetchToMem=25.06.25.06.1-SNAPSHOTG" diff --git a/scripts/encoding-sample/truncate-model.py b/scripts/encoding-sample/truncate-model.py index 0cde5026d..623a0332b 100644 --- a/scripts/encoding-sample/truncate-model.py +++ b/scripts/encoding-sample/truncate-model.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024, NVIDIA CORPORATION. +# Copyright (c) 2024-2025, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,11 +21,11 @@ .builder .getOrCreate() .read - .csv(sys.argv[1]) - .withColumn('_c1', format_string('%.6f', col('_c1').cast('float'))) - .withColumn('_c1', when(col('_c1') == '0.000000', lit('0.0')).otherwise(col('_c1'))) - .withColumn('_c1', when(col('_c1') == '1.000000', lit('1.0')).otherwise(col('_c1'))) - .repartition(1) + .csv(sys.argv[25.06.25.06.1-SNAPSHOT]) + .withColumn('_c25.06.25.06.1-SNAPSHOT', format_string('%.6f', col('_c25.06.25.06.1-SNAPSHOT').cast('float'))) + .withColumn('_c25.06.25.06.1-SNAPSHOT', when(col('_c25.06.25.06.1-SNAPSHOT') == '0.000000', lit('0.0')).otherwise(col('_c25.06.25.06.1-SNAPSHOT'))) + .withColumn('_c25.06.25.06.1-SNAPSHOT', when(col('_c25.06.25.06.1-SNAPSHOT') == '25.06.25.06.1-SNAPSHOT.000000', lit('25.06.25.06.1-SNAPSHOT.0')).otherwise(col('_c25.06.25.06.1-SNAPSHOT'))) + .repartition(25.06.25.06.1-SNAPSHOT) .write .option('nullValue', None) .csv(sys.argv[2])) diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/common.py b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/common.py index 5fffa219b..21e7b6c5b 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/common.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/common.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/one_hot_cpu_main.py b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/one_hot_cpu_main.py index 7e8b5b7eb..427cac172 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/one_hot_cpu_main.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/one_hot_cpu_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/target_cpu_main.py b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/target_cpu_main.py index 89fb984ca..9d5352aa1 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/criteo/target_cpu_main.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/criteo/target_cpu_main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -30,9 +30,9 @@ def get_dict_df(train_df, target_col, label_col): return col_target_df def encode_df(original_df, dict_df, col_name): - dict_df_rename = dict_df.withColumnRenamed('_c0', 'hash').withColumnRenamed('_c1', col_name+'_mean') + dict_df_rename = dict_df.withColumnRenamed('_c0', 'hash').withColumnRenamed('_c25.06.25.06.1-SNAPSHOT', col_name+'_mean') df_mean = (original_df.join(dict_df_rename, original_df[col_name] == dict_df_rename['hash'], how='left').drop('hash').drop(col_name) - .na.fill(-1, [col_name + '_mean'])) + .na.fill(-25.06.25.06.1-SNAPSHOT, [col_name + '_mean'])) return df_mean @@ -45,11 +45,11 @@ def main(args): for col_name, model_path in zip(args.columns, args.modelPaths): df = load_data(spark, args.inputPaths, args, customize_reader).cache() dict_df = get_dict_df(df, col_name, args.labelColumn) - dict_df.repartition(1).write.csv(model_path) + dict_df.repartition(25.06.25.06.1-SNAPSHOT).write.csv(model_path) if args.mode == 'transform': dict_dfs = [ - load_dict_df(spark, path).withColumn('_c1', F.col('_c1').cast(DoubleType())).cache() + load_dict_df(spark, path).withColumn('_c25.06.25.06.1-SNAPSHOT', F.col('_c25.06.25.06.1-SNAPSHOT').cast(DoubleType())).cache() for path in args.modelPaths ] for input_path, output_path in zip(args.inputPaths, args.outputPaths): diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/main.py b/scripts/encoding/python/com/nvidia/spark/encoding/main.py index 6953c49f4..efe8a24a0 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/main.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/utility/args.py b/scripts/encoding/python/com/nvidia/spark/encoding/utility/args.py index 9f272c0ba..77b37b58c 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/utility/args.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/utility/args.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,7 +43,7 @@ def _validate_args(args): if usage: print('-' * 80) print('Usage:\n' + usage) - sys.exit(1) + sys.exit(25.06.25.06.1-SNAPSHOT) def parse_arguments(): parser = ArgumentParser() diff --git a/scripts/encoding/python/com/nvidia/spark/encoding/utility/utils.py b/scripts/encoding/python/com/nvidia/spark/encoding/utility/utils.py index c7858f2a5..0186fc719 100644 --- a/scripts/encoding/python/com/nvidia/spark/encoding/utility/utils.py +++ b/scripts/encoding/python/com/nvidia/spark/encoding/utility/utils.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/scripts/encoding/python/main.py b/scripts/encoding/python/main.py index c0cb2cc50..ce0026c9a 100644 --- a/scripts/encoding/python/main.py +++ b/scripts/encoding/python/main.py @@ -1,5 +1,5 @@ # -# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025.06.25.06.1-SNAPSHOT9, NVIDIA CORPORATION. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tools/databricks/README.md b/tools/databricks/README.md index 4bd8721b3..70d203837 100644 --- a/tools/databricks/README.md +++ b/tools/databricks/README.md @@ -11,12 +11,12 @@ cluster. Once the notebook is activated, you can enter in the log path location top of the notebook. After that, select *Run all* to execute the tools for the specific logs in the log path. ## Limitations -1. Currently local, S3 or DBFS event log paths are supported. -1. S3 path is only supported on Databricks AWS using [instance profiles](https://docs.databricks.com/en/connect/storage/tutorial-s3-instance-profile.html). -1. Eventlog path must follow the formats `/dbfs/path/to/eventlog` or `dbfs:/path/to/eventlog` for logs stored in DBFS. -1. Use wildcards for nested lookup of eventlogs. +25.06.25.06.1-SNAPSHOT. Currently local, S3 or DBFS event log paths are supported. +25.06.25.06.1-SNAPSHOT. S3 path is only supported on Databricks AWS using [instance profiles](https://docs.databricks.com/en/connect/storage/tutorial-s3-instance-profile.html). +25.06.25.06.1-SNAPSHOT. Eventlog path must follow the formats `/dbfs/path/to/eventlog` or `dbfs:/path/to/eventlog` for logs stored in DBFS. +25.06.25.06.1-SNAPSHOT. Use wildcards for nested lookup of eventlogs. - For example: `/dbfs/path/to/clusterlogs/*/*` -1. Multiple event logs must be comma-separated. - - For example: `/dbfs/path/to/eventlog1,/dbfs/path/to/eventlog2` +25.06.25.06.1-SNAPSHOT. Multiple event logs must be comma-separated. + - For example: `/dbfs/path/to/eventlog25.06.25.06.1-SNAPSHOT,/dbfs/path/to/eventlog2` **Latest Tools Version Supported** 25.06.0 \ No newline at end of file diff --git a/tools/emr/README.md b/tools/emr/README.md index 896c6cc70..bf6e67a26 100644 --- a/tools/emr/README.md +++ b/tools/emr/README.md @@ -8,23 +8,23 @@ CPU (qualification) or GPU (profiling) application runs. ## Usage ### Pre-requisites: Setup EMR Studio and Workspace -1. Ensure that you have an **EMR cluster** running. +25.06.25.06.1-SNAPSHOT. Ensure that you have an **EMR cluster** running. 2. Set up **EMR Studio** and **Workspace** by following the instructions in the [AWS Documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-studio.html): - Select **Custom Settings** while creating the Studio. - Choose the **VPC** and **Subnet** where the EMR cluster is running. 3. Attach the Workspace to the running EMR cluster. For more details, refer to the [AWS Documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-use-clusters.html). ### Running the Notebook -1. Import the notebook into the EMR Workspace by dragging and dropping the notebook file. +25.06.25.06.1-SNAPSHOT. Import the notebook into the EMR Workspace by dragging and dropping the notebook file. 2. In the **User Input** section of the notebook, enter the path to event log files. 3. Click the **fast-forward** icon labeled *Restart the kernel, then re-run the whole notebook* to process the logs at the specified path. ## Limitations -1. Currently, local and S3 event log paths are supported. -1. Eventlog path must follow the formats `/local/path/to/eventlog` for local logs or `s3://my-bucket/path/to/eventlog` for logs stored in S3. -1. The specified path can also be a directory. In such cases, the tool will recursively search for event logs within the directory. +25.06.25.06.1-SNAPSHOT. Currently, local and S3 event log paths are supported. +25.06.25.06.1-SNAPSHOT. Eventlog path must follow the formats `/local/path/to/eventlog` for local logs or `s3://my-bucket/path/to/eventlog` for logs stored in S3. +25.06.25.06.1-SNAPSHOT. The specified path can also be a directory. In such cases, the tool will recursively search for event logs within the directory. - For example: `/path/to/clusterlogs` -1. To specify multiple event logs, separate the paths with commas. - - For example: `s3://my-bucket/path/to/eventlog1,s3://my-bucket/path/to/eventlog2` +25.06.25.06.1-SNAPSHOT. To specify multiple event logs, separate the paths with commas. + - For example: `s3://my-bucket/path/to/eventlog25.06.25.06.1-SNAPSHOT,s3://my-bucket/path/to/eventlog2` **Latest Tools Version Supported** 24.08.2