You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article describes how to connect from [Apache NiFi](http://nifi.apache.org) to CrateDB and ingest data from NiFi into CrateDB.
4
+
Learn how to connect from [Apache NiFi](https://nifi.apache.org) to CrateDB
5
+
and ingest data from NiFi into CrateDB.
5
6
6
7
## Prerequisites
7
-
To follow this article, you will need:
8
+
You need:
8
9
* A CrateDB cluster
9
10
* An Apache NiFi installation that can connect to the CrateDB cluster
10
11
11
12
## Configure
12
-
First, we will set up a connection pool to CrateDB:
13
-
1. On the main NiFi web interface, click the gear icon of your process group ("NiFi Flow" by default).
14
-
2. Switch to "Controller Services" and click the plus icon to add a new controller.
15
-
3. Choose "DBCPConnectionPool" as type and click "Add".
16
-
4. Open the settings of the newly created connection pool and switch to "Properties". The table below describes in more detail which parameters need to be changed.
13
+
Set up a connection pool to CrateDB:
14
+
1. On the main NiFi web interface, click the gear icon of your process group ("NiFi Flow" by default).
15
+
2. Switch to "Controller Services" and click the plus icon to add a new controller.
16
+
3. Choose "DBCPConnectionPool" as type and click "Add".
17
+
4. Open the new connection pool, switch to "Properties", and set the following parameters:
| Database Connection URL | The JDBC connection string pointing to CrateDB |`jdbc:postgresql://<CrateDB host>:5432/doc?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory`|
21
-
| Database Driver Class Name | The PostgreSQL JDBC driver class name |`org.postgresql.Driver`|
22
-
| Database Driver Location(s)|[Download](https://jdbc.postgresql.org/download/) the latest PostgreSQL JDBC driver and place it on the file system of the NiFi host |`/opt/nifi/nifi-1.13.2/postgresql-42.2.23.jar`|
| Database Connection URL | The JDBC connection string pointing to CrateDB |`jdbc:postgresql://<cratedb-host>:5432/doc?sslmode=verify-full&sslrootcert=/path/to/ca.pem`|
22
+
| Database Driver Class Name | The PostgreSQL JDBC driver class name |`org.postgresql.Driver`|
23
+
| Database Driver Location(s)|[Download](https://jdbc.postgresql.org/download/) the latest PostgreSQL JDBC driver and place it on the file system of the NiFi host |`${nifi.home}/lib/postgresql-42.7.x.jar`|
24
+
| Database User | The CrateDB user name ||
25
+
| Password | The password of your CrateDB user ||
25
26
26
-
5.After applying the changed properties, click the flash icon to enable the service.
27
+
5.Apply the properties, then click the lightning bolt to enable the service.
27
28
28
-
Now the connection pool is ready to be used in one of NiFi's processors.
29
+
You can now use the connection pool in NiFi processors.
29
30
30
31
## Example: Read from CSV files
31
-
One common use case is to design a process in NiFi that results in data being ingested into CrateDB. As an example, we will take a CSV file from the [NYC Taxi Data](https://github.com/toddwschneider/nyc-taxi-data) repository, process it in NiFi, and then ingest it into Crate DB.
32
+
One common use case is to design a process in NiFi that results in data being
33
+
ingested into CrateDB. This example takes a CSV file from the
processes it in NiFi, and then ingests it into CrateDB.
32
36
33
-
To achieve high throughput, NiFi uses by default prepared statements with configurable batch size. The optimal batch size depends on your concrete use case, 500 is typically a good starting point. Please also see the documentation on [insert performance](https://crate.io/docs/crate/howtos/en/latest/performance/inserts/index.html) for additional information.
37
+
NiFi uses prepared statements and batching by default. Start with a batch size
38
+
of 500 and adjust to your workload. See [insert performance] for details.
34
39
35
40
{height=480}
36
41
37
-
In CrateDB, we first create the corresponding target table:
After configuring the processors as described below, click the start icon on the process group window. You should see rows appearing in CrateDB after a short amount of time. If you encounter any issues, please also check NiFi's log files (`log/nifi-bootstrap.log` and `log/nifi-app.log`).
67
+
Start the process group. Rows should appear in CrateDB shortly. To verify:
68
+
69
+
```sql
70
+
SELECTcount(*) FROMdoc.yellow_taxi_trips;
71
+
```
72
+
If you run into issues, check NiFi logs: `log/nifi-bootstrap.log` and
73
+
`log/nifi-app.log`.
63
74
64
75
### GetFile
65
76
The `GetFile` processor points to a local directory that contains the file [yellow_tripdata_2013-08.csv](https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2013-08.csv).
66
77
67
78
### PutDatabaseRecord
68
79
The PutDatabaseRecord has a couple of properties that need to be configured:
69
-
* Record Reader: CSVReader. The CSVReader is configured to use "Use String Fields From Header" as a "Schema Access Strategy".
80
+
* Record Reader: CSVReader
81
+
* Schema Access Strategy: "Use String Fields From Header"
82
+
* Treat First Line as Header: true
70
83
* Database Type: PostgreSQL
71
84
* Statement Type: INSERT
72
85
* Database Connection Pooling Service: The connection pool created previously
73
86
* Schema Name: `doc`
74
87
* Table Name: `yellow_taxi_trips`
75
-
* Maximum Batch Size: 200
88
+
* Maximum Batch Size: 500
76
89
77
90
## Example: Read from another SQL-based database
78
-
Data can be also be read from a SQL database and then be inserted into CrateDB:
91
+
Read data from a SQL database and insert it into CrateDB:
79
92

93
+
80
94
### ExecuteSQLRecord
81
95
Reads rows from the source database.
82
96
* Database Connection Pooling Service: A connection pool pointing to the source database
83
97
* SQL select query: The SQL query to retrieve rows as needed
84
-
* RecordWriter: JsonRecordSetWriter. JSON files are required by the following processors for conversion into SQL statements.
98
+
* RecordWriter: JsonRecordSetWriter. The following processors require JSON files for conversion into SQL statements.
85
99
86
100
### ConvertJSONToSQL
87
101
Converts the generated JSON files into SQL statements.
@@ -94,4 +108,7 @@ Converts the generated JSON files into SQL statements.
94
108
Executes the previously generated SQL statements as prepared statements.
95
109
* JDBC Connection Pool: A connection pool pointing to CrateDB
96
110
* SQL Statement: No value set
97
-
* Batch Size: 500 (the optimal value for your use case might vary)
111
+
* Batch Size: 500 (the optimal value varies by use case)
0 commit comments