diff --git a/README.md b/README.md index 55b1787..8a77379 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ -# Cloud Data Access Client +# Open Source Reader for Cloud Data Access -The Cloud Data Access Client (CDA Client) is a utility to download table data from an Amazon S3 bucket that has been populated by Guidewire's Cloud Data Access (CDA). The utility reads the .parquet files generated by CDA for each table, and can do one of the following: +The Open Source Reader (OSR) for Cloud Data Access (CDA) is a utility to download table data from an Amazon S3 bucket that has been populated by Guidewire's Cloud Data Access application. The utility reads the .parquet files generated by CDA for each table, and can do one of the following: - Convert them into human-readable .csv files - Regenerate them into new `.parquet` files or aggregate them into fewer `.parquet` files - Load the data into a database: SQL Server, Oracle, PostgreSQL -The example code can be used to jump-start development of a custom implementation that can store CDA data in other storage formats (e.g., RDBMS Tables, Hadoop/Hive, JSON Files). +The OSR is also referred to as the CDA Client and CDA Reader. The example code can be used to jump-start development of a custom implementation that can store CDA data in other storage formats (e.g., RDBMS Tables, Hadoop/Hive, JSON Files). Learn more about CDA [here](https://docs.guidewire.com/cloud/cda/banff/index.html). - - - @@ -99,7 +99,7 @@ Guidewire has completed some basic performance testing in AWS EMR. The test CDA - - - -# Overview of CDA Client +# Overview of OSR
Click to expand When converting CDA output to `.csv` files, the utility provides the schema for each table in a `schema.yaml` file, and can be configured to put these files into a local filesystem location or another Amazon S3 bucket. When writing to a database, the data can be loaded in "raw" format, with each insert/update/delete recorded from the source system database, or it can be merged into tables that more closely resemble the source system database. @@ -115,21 +115,21 @@ The utility also resumes downloading from the point which it last read up to whe
- - - -## Build the CDA Client +## Build the OSR
Click to expand 1. Set up your IDE: - Use Java/JDK 8 - Open project dir with IntelliJ -2. Download the CDA Client code. +2. Download the OSR code. 3. Build by executing this command: ~~~~ ./gradlew build ~~~~ 4. **For Windows only, download additional utilities**: This utility uses Spark, which in turn uses Hadoop to interact with local filesystems. Hadoop requires an additional Windows library to function correctly with the Windows file system. - 1. Create a `bin` folder in the folder that contains the CDA Client JAR file. + 1. Create a `bin` folder in the folder that contains the OSR JAR file. 2. Download the winutils.exe file for Hadoop 2.7 and place it in `bin` folder (e.g., [winutils](https://github.com/cdarlint/winutils/tree/master/hadoop-2.7.7/bin)). 3. Download and install this Visual C++ Redistributable package: @@ -146,7 +146,7 @@ For more info, see:
- - - -## Run the CDA Client +## Run the OSR
Click to expand @@ -164,12 +164,13 @@ export AWS_PROFILE=myProfile 2. Download the sample configuration file from the Git repository folder `/src/test/resources/sample_config.yaml` and save under a new name such as `config.yaml`. 3. Configure the `config.yaml` file. 4. Run the utility by executing the jar from the command line with one of these commands: -
- - - -## Example of a CDA Client run with CSV output +## Example of an OSR run with CSV output
Click to expand @@ -493,7 +494,7 @@ sourceLocation: ` -In the local filesystem, the client jar and config.yaml file exist in the current directory, along with a directory in which to contain the .csv outputs: +In the local filesystem, the OSR jar and config.yaml file exist in the current directory, along with a directory in which to contain the .csv outputs: ~~~~ cloud-data-access-client-1.0.jar config.yaml @@ -521,11 +522,11 @@ java -jar cloud-data-access-client-1.0.jar -c "config.yaml" ~~~~ -After the CDA Client completes writing, the contents of cda_client_output looks like so: +After the OSR completes writing, the contents of cda_client_output looks like so: ![Sample Output](./images/cda_client_sample_output.png) Each table has a corresponding folder. The .csv file in a folder contains the table's data, and the schema.yaml contains information about the columns, namely the name, dataType, and nullable boolean for each column. -When rerunning the utility, the client will resume from the savepoints written in the savepoints.json file from the previous. The existing .csv file is deleted, and a new .csv file containing new data will be written in its place. +When rerunning the utility, the OSR will resume from the savepoints written in the savepoints.json file from the previous. The existing .csv file is deleted, and a new .csv file containing new data will be written in its place.
diff --git a/build.gradle b/build.gradle index dc8c588..630fa9e 100644 --- a/build.gradle +++ b/build.gradle @@ -16,6 +16,16 @@ repositories { } dependencies { + constraints { + implementation ("org.apache.parquet:parquet-avro") { + version { + prefer '1.15.1' + strictly '[1.15.1,2.0.0]' + } + because 'CVE-2025-30065 : Apache Parquet Remote Code Execution Vulnerability' + } + } + implementation "org.scala-lang:scala-library:$scalaVersion.$scalaBuild" testImplementation "org.scalatest:scalatest_$scalaVersion:$scalaTestVersion" testImplementation "junit:junit:$jUnitVersion" diff --git a/src/main/scala/gw/cda/api/CloudDataAccessClient.scala b/src/main/scala/gw/cda/api/CloudDataAccessClient.scala index 35a6918..4cb213d 100644 --- a/src/main/scala/gw/cda/api/CloudDataAccessClient.scala +++ b/src/main/scala/gw/cda/api/CloudDataAccessClient.scala @@ -19,7 +19,6 @@ object CloudDataAccessClient { // Log the name of the config file, so if there is a problem processing it, you will know the name of the file log.info(s"Loading config file '$configFilePath'") val clientConfig: ClientConfig = ClientConfigReader.processConfigFile(configFilePath) - log.info(s"The config file has been loaded - $clientConfig") // Moved processConfig() outside of TableReader, parsing it is an unnecessary responsibility of the TableReader val tableReader = new TableReader(clientConfig)