Skip to content

Commit db9010d

Browse files
committed
NiFi: Implement suggestions by CodeRabbit
1 parent 29ba5be commit db9010d

File tree

2 files changed

+43
-26
lines changed

2 files changed

+43
-26
lines changed

docs/integrate/nifi/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ worldwide across every industry.
4141
:::{grid-item-card} Tutorial: Connect Apache NiFi and CrateDB
4242
:link: nifi-tutorial
4343
:link-type: ref
44-
How to connect from Apache NiFi to CrateDB and ingest data from NiFi into CrateDB.
44+
Connect Apache NiFi to CrateDB and ingest data.
4545
:::
4646

4747
::::

docs/integrate/nifi/tutorial.md

Lines changed: 42 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,45 @@
11
(nifi-tutorial)=
22
# Connecting to CrateDB from Apache NiFi
33

4-
This article describes how to connect from [Apache NiFi](http://nifi.apache.org) to CrateDB and ingest data from NiFi into CrateDB.
4+
Learn how to connect from [Apache NiFi](https://nifi.apache.org) to CrateDB
5+
and ingest data from NiFi into CrateDB.
56

67
## Prerequisites
7-
To follow this article, you will need:
8+
You need:
89
* A CrateDB cluster
910
* An Apache NiFi installation that can connect to the CrateDB cluster
1011

1112
## Configure
12-
First, we will set up a connection pool to CrateDB:
13-
1. On the main NiFi web interface, click the gear icon of your process group ("NiFi Flow" by default).
14-
2. Switch to "Controller Services" and click the plus icon to add a new controller.
15-
3. Choose "DBCPConnectionPool" as type and click "Add".
16-
4. Open the settings of the newly created connection pool and switch to "Properties". The table below describes in more detail which parameters need to be changed.
13+
Set up a connection pool to CrateDB:
14+
1. On the main NiFi web interface, click the gear icon of your process group ("NiFi Flow" by default).
15+
2. Switch to "Controller Services" and click the plus icon to add a new controller.
16+
3. Choose "DBCPConnectionPool" as type and click "Add".
17+
4. Open the new connection pool, switch to "Properties", and set the following parameters:
1718

18-
| Parameter | Description | Sample value |
19-
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
20-
| Database Connection URL | The JDBC connection string pointing to CrateDB | `jdbc:postgresql://<CrateDB host>:5432/doc?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory` |
21-
| Database Driver Class Name | The PostgreSQL JDBC driver class name | `org.postgresql.Driver` |
22-
| Database Driver Location(s)| [Download](https://jdbc.postgresql.org/download/) the latest PostgreSQL JDBC driver and place it on the file system of the NiFi host | `/opt/nifi/nifi-1.13.2/postgresql-42.2.23.jar` |
23-
| Database User | The CrateDB user name | |
24-
| Password | The password of your CrateDB user | |
19+
| Parameter | Description | Sample value |
20+
| -------------------------- |----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
21+
| Database Connection URL | The JDBC connection string pointing to CrateDB | `jdbc:postgresql://<cratedb-host>:5432/doc?sslmode=verify-full&sslrootcert=/path/to/ca.pem` |
22+
| Database Driver Class Name | The PostgreSQL JDBC driver class name | `org.postgresql.Driver` |
23+
| Database Driver Location(s)| [Download](https://jdbc.postgresql.org/download/) the latest PostgreSQL JDBC driver and place it on the file system of the NiFi host | `${nifi.home}/lib/postgresql-42.7.x.jar` |
24+
| Database User | The CrateDB user name | |
25+
| Password | The password of your CrateDB user | |
2526

26-
5. After applying the changed properties, click the flash icon to enable the service.
27+
5. Apply the properties, then click the lightning bolt to enable the service.
2728

28-
Now the connection pool is ready to be used in one of NiFi's processors.
29+
You can now use the connection pool in NiFi processors.
2930

3031
## Example: Read from CSV files
31-
One common use case is to design a process in NiFi that results in data being ingested into CrateDB. As an example, we will take a CSV file from the [NYC Taxi Data](https://github.com/toddwschneider/nyc-taxi-data) repository, process it in NiFi, and then ingest it into Crate DB.
32+
One common use case is to design a process in NiFi that results in data being
33+
ingested into CrateDB. This example takes a CSV file from the
34+
[NYC Taxi Data](https://github.com/toddwschneider/nyc-taxi-data) repository,
35+
processes it in NiFi, and then ingests it into CrateDB.
3236

33-
To achieve high throughput, NiFi uses by default prepared statements with configurable batch size. The optimal batch size depends on your concrete use case, 500 is typically a good starting point. Please also see the documentation on [insert performance](https://crate.io/docs/crate/howtos/en/latest/performance/inserts/index.html) for additional information.
37+
NiFi uses prepared statements and batching by default. Start with a batch size
38+
of 500 and adjust to your workload. See [insert performance] for details.
3439

3540
![Screenshot 2021-04-20 at 13.58.18|576x500](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/474e6e5a44eb5df4928599e23b3ca2a00392b56f.png){height=480}
3641

37-
In CrateDB, we first create the corresponding target table:
42+
Create the corresponding target table in CrateDB:
3843

3944
```sql
4045
CREATE TABLE "doc"."yellow_taxi_trips" (
@@ -59,29 +64,38 @@ CREATE TABLE "doc"."yellow_taxi_trips" (
5964
);
6065
```
6166

62-
After configuring the processors as described below, click the start icon on the process group window. You should see rows appearing in CrateDB after a short amount of time. If you encounter any issues, please also check NiFi's log files (`log/nifi-bootstrap.log` and `log/nifi-app.log`).
67+
Start the process group. Rows should appear in CrateDB shortly. To verify:
68+
69+
```sql
70+
SELECT count(*) FROM doc.yellow_taxi_trips;
71+
```
72+
If you run into issues, check NiFi logs: `log/nifi-bootstrap.log` and
73+
`log/nifi-app.log`.
6374

6475
### GetFile
6576
The `GetFile` processor points to a local directory that contains the file [yellow_tripdata_2013-08.csv](https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2013-08.csv).
6677

6778
### PutDatabaseRecord
6879
The PutDatabaseRecord has a couple of properties that need to be configured:
69-
* Record Reader: CSVReader. The CSVReader is configured to use "Use String Fields From Header" as a "Schema Access Strategy".
80+
* Record Reader: CSVReader
81+
* Schema Access Strategy: "Use String Fields From Header"
82+
* Treat First Line as Header: true
7083
* Database Type: PostgreSQL
7184
* Statement Type: INSERT
7285
* Database Connection Pooling Service: The connection pool created previously
7386
* Schema Name: `doc`
7487
* Table Name: `yellow_taxi_trips`
75-
* Maximum Batch Size: 200
88+
* Maximum Batch Size: 500
7689

7790
## Example: Read from another SQL-based database
78-
Data can be also be read from a SQL database and then be inserted into CrateDB:
91+
Read data from a SQL database and insert it into CrateDB:
7992
![Screenshot 2021-07-15 at 09.59.36|690x229](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/ee51baa35eddf540838d7d784cb433a1e16e1b02.png)
93+
8094
### ExecuteSQLRecord
8195
Reads rows from the source database.
8296
* Database Connection Pooling Service: A connection pool pointing to the source database
8397
* SQL select query: The SQL query to retrieve rows as needed
84-
* RecordWriter: JsonRecordSetWriter. JSON files are required by the following processors for conversion into SQL statements.
98+
* RecordWriter: JsonRecordSetWriter. The following processors require JSON files for conversion into SQL statements.
8599

86100
### ConvertJSONToSQL
87101
Converts the generated JSON files into SQL statements.
@@ -94,4 +108,7 @@ Converts the generated JSON files into SQL statements.
94108
Executes the previously generated SQL statements as prepared statements.
95109
* JDBC Connection Pool: A connection pool pointing to CrateDB
96110
* SQL Statement: No value set
97-
* Batch Size: 500 (the optimal value for your use case might vary)
111+
* Batch Size: 500 (the optimal value varies by use case)
112+
113+
114+
[insert performance]: https://crate.io/docs/crate/howtos/en/latest/performance/inserts/index.html

0 commit comments

Comments
 (0)